Analytica Chimica Acta 443 (2001) 107–115
Aspects of the successive projections algorithm
for variable selection in multivariate calibration
applied to plasma emission spectrometry
Roberto Kawakami Harrop Galvão
a
, Maria Fernanda Pimentel
b
,
Mario Cesar Ugulino Araujo
c,∗
, Takashi Yoneyama
a
, Valeria Visani
c
a
Divisão de Engenharia Eletrônica, Instituto Tecnológico de Aeronáutica, 12228-900 São José dos Campos, SP, Brazil
b
Departamento de Engenharia Qu´ ımica — CTG, Universidade Federal de Pernambuco, CEP 50740-521 Recife, PE, Brazil
c
Departamento de Qu´ ımica, Universidade Federal da Para´ ıba, CCEN, Caixa Postal 5093, CEP 58051-970 João Pessoa, PB, Brazil
Received 16 October 2000; received in revised form 8 May 2001; accepted 6 June 2001
Abstract
The successive projections algorithm (SPA) was recently proposed as a variable selection strategy to minimize collinearity
problems in multivariate calibration. Although SPA has been successfully applied to UV–VIS spectrophotometric multi-
component analysis, no evidence of its ability to deal with variable sets with both high and low signal-to-noise ratios has
been presented. This issue is addressed by the present work, which applies SPA to the simultaneous determination of Mn,
Mo, Cr, Ni and Fe using a low-resolution plasma spectrometer/diode array detection system. This problem is of particular
interest since strong interanalyte spectral interferences arise and regions with high and low signal intensity alternate in the
spectra. Results show that multiple linear regression (MLR) on the wavelengths selected by SPA yields models with better
prediction capabilities than principal component regression (PCR) and partial least squares (PLS) models. A standard genetic
algorithm (GA) used for comparison yielded results similar to SPA for Mn, Cr and Fe, and better predictions for Mo and Ni.
However, in all cases, the GA resulted in models less parsimonious than SPA. The average of the root mean square relative
error of prediction (RMSREP) obtained for the five analytes was 1.4% for MLR–SPA, 1.0% for MLR–GA, 2.2% for PCR,
and 2.1% for PLS. Since the computational time demanded by SPA grows with the square of the number of spectral variables,
a pre-selection procedure based on the identification of emission peaks is proposed. This procedure decreased selection time
by a factor of 20, without significantly degrading the results. © 2001 Elsevier Science B.V. All rights reserved.
Keywords: Variable selection; Successive projections algorithm; Plasma emission spectrometry; Multivariate calibration
1. Introduction
The application of multivariate calibration methods,
such as multiple linear regression (MLR), principal
∗
Corresponding author. Tel.: +55-83-216-7438;
fax: +55-83-216-7437.
E-mail addresses: kawakami@ele.ita.br (R. Kawakami Harrop
Galvão), mfp@npd.ufpe.br (M. Fernanda Pimentel),
laqa@quimica.ufpb.br (M. Cesar Ugulino Araujo).
component regression (PCR) and partial least squares
(PLS) [1], to spectrometric multicomponent simulta-
neous analysis may require spectral variable selection
for building well-fitted models [2]. Several approaches
[3–20] have been proposed to select optimal sets of
variables for multivariate calibration.
MLR yields models which are simpler and easier
to interpret than PCR and PLS, since these calibration
techniques perform regression on latent variables,
which do not have physical meaning. On the other
0003-2670/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved.
PII:S0003-2670(01)01182-5