Analytica Chimica Acta 443 (2001) 107–115 Aspects of the successive projections algorithm for variable selection in multivariate calibration applied to plasma emission spectrometry Roberto Kawakami Harrop Galvão a , Maria Fernanda Pimentel b , Mario Cesar Ugulino Araujo c, , Takashi Yoneyama a , Valeria Visani c a Divisão de Engenharia Eletrônica, Instituto Tecnológico de Aeronáutica, 12228-900 São José dos Campos, SP, Brazil b Departamento de Engenharia Qu´ ımica — CTG, Universidade Federal de Pernambuco, CEP 50740-521 Recife, PE, Brazil c Departamento de Qu´ ımica, Universidade Federal da Para´ ıba, CCEN, Caixa Postal 5093, CEP 58051-970 João Pessoa, PB, Brazil Received 16 October 2000; received in revised form 8 May 2001; accepted 6 June 2001 Abstract The successive projections algorithm (SPA) was recently proposed as a variable selection strategy to minimize collinearity problems in multivariate calibration. Although SPA has been successfully applied to UV–VIS spectrophotometric multi- component analysis, no evidence of its ability to deal with variable sets with both high and low signal-to-noise ratios has been presented. This issue is addressed by the present work, which applies SPA to the simultaneous determination of Mn, Mo, Cr, Ni and Fe using a low-resolution plasma spectrometer/diode array detection system. This problem is of particular interest since strong interanalyte spectral interferences arise and regions with high and low signal intensity alternate in the spectra. Results show that multiple linear regression (MLR) on the wavelengths selected by SPA yields models with better prediction capabilities than principal component regression (PCR) and partial least squares (PLS) models. A standard genetic algorithm (GA) used for comparison yielded results similar to SPA for Mn, Cr and Fe, and better predictions for Mo and Ni. However, in all cases, the GA resulted in models less parsimonious than SPA. The average of the root mean square relative error of prediction (RMSREP) obtained for the five analytes was 1.4% for MLR–SPA, 1.0% for MLR–GA, 2.2% for PCR, and 2.1% for PLS. Since the computational time demanded by SPA grows with the square of the number of spectral variables, a pre-selection procedure based on the identification of emission peaks is proposed. This procedure decreased selection time by a factor of 20, without significantly degrading the results. © 2001 Elsevier Science B.V. All rights reserved. Keywords: Variable selection; Successive projections algorithm; Plasma emission spectrometry; Multivariate calibration 1. Introduction The application of multivariate calibration methods, such as multiple linear regression (MLR), principal Corresponding author. Tel.: +55-83-216-7438; fax: +55-83-216-7437. E-mail addresses: kawakami@ele.ita.br (R. Kawakami Harrop Galvão), mfp@npd.ufpe.br (M. Fernanda Pimentel), laqa@quimica.ufpb.br (M. Cesar Ugulino Araujo). component regression (PCR) and partial least squares (PLS) [1], to spectrometric multicomponent simulta- neous analysis may require spectral variable selection for building well-fitted models [2]. Several approaches [3–20] have been proposed to select optimal sets of variables for multivariate calibration. MLR yields models which are simpler and easier to interpret than PCR and PLS, since these calibration techniques perform regression on latent variables, which do not have physical meaning. On the other 0003-2670/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved. PII:S0003-2670(01)01182-5