Computational Statistics & Data Analysis 47 (2004) 225–236 www.elsevier.com/locate/csda Computational aspects of algorithms for variable selection in the context of principal components Jorge Cadima a , J. Orestes Cerdeira a ; , Manuel Minhoto b a Departamento de Matem atica, Instituto Superior de Agronomia, Tapada da Ajuda, Lisboa 1349-017, Portugal b Departamento de Matem atica, Universidade de Evora, Col egio Luis Ant onio Vernay, Rua Rom˜ ao Ramalho 59, Evora 7000, Portugal Received 4 November 2003; received in revised form 4 November 2003; accepted 8 November 2003 Abstract Variable selection consists in identifying a k -subset of a set of original variables that is optimal for a given criterion of adequate approximation to the whole data set. Several algorithms for the optimization problems resulting from three dierent criteria in the context of principal components analysis are considered, and computational results are presented. c 2003 Elsevier B.V. All rights reserved. Keywords: Combinatorial optimization; Heuristics; Principal components; Principal variables; Variable selection 1. Introduction In the analysis of data sets with large numbers of variables a frequent aim is to reduce the dimensionality of the data set. A typical way of reducing the dimension of a data set is through a principal component analysis (PCA). Standard results guarantee that retaining the k principal components (PCs) with the largest associated variance produces the k -subset of linear combinations of the p original variables which, under various criteria, best approximates the original variables (see, for example, Jollie, 2002, Section 6.3). PCA is an appropriate tool for deriving low-dimension subspaces which capture most of the information of the whole data set. * Corresponding author. Tel.: +351-2136-53467; fax: +351-2136-30723. E-mail addresses: jcadima@isa.utl.pt (J. Cadima), orestes@isa.utl.pt (J.O. Cerdeira), minhoto@uevora.pt (M. Minhoto). 0167-9473/$ - see front matter c 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2003.11.001