rs-Sparse principal component analysis: A mixed integer nonlinear programming approach with VNS $ Emilio Carrizosa, Vanesa Guerrero n Instituto de Matemáticas de la Universidad de Sevilla, Fac. de Matemáticas, Avda Reina Mercedes s/n, 41012 Sevilla, Spain article info Keywords: Sparse principal component analysis Variable neighborhood search Nonlinear mixed integer programming abstract Principal component analysis is a popular data analysis dimensionality reduction technique, aiming to project with minimum error for a given dataset into a subspace of smaller number of dimensions. In order to improve interpretability, different variants of the method have been proposed in the literature, in which, besides error minimization, sparsity is sought. In this paper we formulate as a mixed integer nonlinear program the problem of ﬁnding a subspace with a sparse basis minimizing the sum of squares of distances between the points and their projections. Contrary to other attempts in the literature, with our model the user can ﬁx the level of sparseness of the resulting basis vectors. Variable neighborhood search is proposed to solve the problem obtained this way. Our numerical experience on test sets shows that our procedure outperforms benchmark methods in the literature, both in terms of sparsity and errors. & 2013 Elsevier Ltd. All rights reserved. 1. Introduction and literature review Principal component analysis (PCA) was ﬁrst introduced by [20] as a method for projecting a set of points u 1 ; …; u p ∈R n onto a lower dimensional space in such a way that the distances between the points and their projections are minimized, see e.g., [14,9]. Given k vectors c 1 ; …; c k ∈R n , let π fc 1 ;…;c k g denote the projection onto the linear space Lðfc 1 ; …; c k gÞ spanned by the vectors c 1 ; …; c k π fc 1 ;…;c k g ðuÞ¼ arg minf∥u−z∥ : z∈Lðfc 1 ; …; c k gÞg: The aim of PCA is to ﬁnd a set of k ≤ n orthonormal vectors c 1 ; …; c k such that the mean squared distance between the points in the dataset fu 1 ; …; u p g and their projections onto the vector space Lðfc 1 ; …; c k gÞ generated by fc 1 ; …; c k g is minimized. In other words, the following mixed integer nonlinear program (MINLP) is to be solved min c 1 ;…;c k : orthonormal 1 p ∑ p h ¼ 1 ∥u h −π fc 1 ;…;c k g ðu h Þ∥ 2 The optimal solutions, c n ¼ðc n 1 ; …; c n k Þ, are called principal components (PC). The main drawback of this dimensionality-reduction technique is interpretability: interpreting the projections is usually quite difﬁcult due to the fact that most of the original variables are present in each c n i , i ¼ 1, …, k, i.e., the PC are not sparse. Interpretability is improved if some loadings (the coefﬁcients of PCs) are zero, and this has been pursued in different papers. A ﬁrst attempt for achieving this is rounding, by considering all loadings smaller than some threshold absolute value as zero. However, this procedure has been shown to be misleading, see [4]. Varimax rotation, see [15], has also been proposed, but it hardly ever achieves the aim of an easier interpretation despite losing orthogonality of the loadings and uncorrelation of the components. Some authors relate the notion of simplicity to the fact that loadings belong to the set of integers. Such idea was developed in [26] and later in [23], who called their method simple component analysis (SCA). SCA allows the user, under his or her criterion, to modify a simple system of components in order to improve interpretability. However, SCA does not yield either orthogonal or uncorrelated components. An exploratory approach to SCA was presented in [1] for achieving orthogonality. Also in [24], SCA is modiﬁed by using genetic algorithms. Another way of obtaining sparsity is by constraining the number of non-zero loadings in each PC. In this line, Ref. [5] proposes a convex relaxation method based on semideﬁnite programming, which does not preserve orthogonality or uncorrelation, as while [6] a branch-and-bound approach lets the user choose between keeping orthogonality or uncorrelation. A related approach is presented in [11], in which a bound on the sum of the absolute values of the loadings is added, combining this way the lasso paradigm, [25], with PCA. In [28], PCA is formulated as a regression-type optimization problem. Sparse loadings are achieved by imposing elasticnet constraint, [27], a generalization of lasso, on the regression coefﬁcients. The sparse PCs obtained from sparse principal component analysis (SPCA) are neither orthogonal nor uncorrelated. See another lasso-based model in [21]. Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/caor Computers & Operations Research 0305-0548/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.cor.2013.04.012 ☆ This research is supported by Grants MTM2012-36136 (Ministerio de Economía y Competitividad, Spain) and FQM329 (Junta de Andalucía), both co-funded by EU ERD funds. n Corresponding author. Tel.: +34 687153536. E-mail addresses: ecarrizosa@us.es (E. Carrizosa), vguerrero@us.es (V. Guerrero). Please cite this article as: Carrizosa E, Guerrero V. rs-Sparse principal component analysis: A mixed integer nonlinear programming approach with VNS. Computers and Operations Research (2013), http://dx.doi.org/10.1016/j.cor.2013.04.012i Computers & Operations Research ∎ (∎∎∎∎) ∎∎∎–∎∎∎