IP-LSSVM: A two-step sparse classifier B.P.R. Carvalho * , A.P. Braga Depto. Engenharia Eletrônica, Campus da UFMG, Pampulha, 31.270-901 Belo Horizonte, MG, Brazil article info Article history: Received 25 February 2008 Received in revised form 5 February 2009 Available online 7 August 2009 Communicated by P. Sarkar Keywords: Sparse classifier Least squares support vector machine Support vector automatic detection abstract We present in this work a two-step sparse classifier called IP LSSVM which is based on Least Squares Support Vector Machine (LS-SVM). The formulation of LS-SVM aims at solving the learning problem with a system of linear equations. Although this solution is simpler, there is a loss of sparseness in the feature vectors. Many works on LS-SVM are focused on improving support vectors representation in the least squares approach, since they correspond to the only vectors that must be stored for further usage of the machine, which can also be directly used as a reduced subset that represents the initial one. The pro- posed classifier incorporates the advantages of either SVM and LS-SVM: automatic detection of support vectors and a solution obtained simply by the solution of systems of linear equations. IP LSSVM was compared with other sparse LS-SVM classifiers from literature, LS 2 SVM; Pruning; Ada Pinv and RRS þ LS SVM. The experiments were performed on four important benchmark databases in Machine Learning and on two artificial databases created to show visually the support vectors detected. The results show that IP LSSVM represents a viable alternative to SVMs, since both have similar features, supported by literature results and yet IP LSSVM has a simpler and more understandable formulation. Ó 2009 Elsevier B.V. All rights reserved. 1. Introduction The success of Support Vector Machine (SVM) (Vapnik, 1995) is mainly due to its solid formal basis and elegant approach in margin maximization and support vectors selection. Maximum margin hyperplane can be obtained thanks to the quadratic programming (QP) approach to the learning problem, while support vectors are outlined by the sensitivity of the corresponding Lagrange multipli- ers (Vapnik, 1995), which are non-zero in the separation margin. Nevertheless, alternatives to the quadratic programming approach, such as the Least Squares Support Vector Machine (LS-SVM) (Suy- kens and Vandewalle, 1999) are found in the literature. LS-SVM yields simplicity by solving the primal problem as a system of linear equations. The least squares (LS) solution is less computa- tionally intensive than the quadratic programming one, but it also results on loss of sparseness of the Lagrange multipliers vector. Therefore, selecting LS-SVM support vectors by the non-zero crite- rion usually results on all training patterns being considered as support vectors, what is sometimes regarded as a drawback of the LS approach. The importance of an optimal number of support vectors can not be neglected in a classification problem, since they represent the most relevant samples for outlining the separation boundary. Support vectors are useful for representing large static and dynamic data sets for classification purposes and can also help in problem analysis by pointing out to the most relevant cases (Tax and Duin, 1999; Ganapathiraju and Picone, 2000). As a conse- quence of this trade-off between sparseness and complexity, many works on LS-SVM are focused on improving support vectors repre- sentation of the LS approach (Suykens et al., 2000; Valyon and Hor- váth, 2004; Carvalho and Braga, 2005; Carvalho et al., 2007). The motivation behind these works are that LS-SVM may still provide a reduced set of support vectors, by simply observing the proper features from the LS solution. SVM’s constrained optimization problem is formalized in the LS-SVM approach as a least squares problem in the form AX ¼ B, where A contains mainly kernel mapping information, X contains the optimization parameters (Lagrange multipliers a and bias b) and B is a vector of equality constrains. The problem of support vectors identification in this approach can be re- garded as the one of solving the optimization problem with the smallest possible vector X. This would result on a maximum margin with minimum number of support vectors. The problem is therefore on selecting rows of X without changing the separat- ing hyperplane and yet maintaining the original LS-SVM formulation. In order to avoid kernel mapping information loss due to dimensionality reduction of A as a consequence of eliminating rows of X, the IP LSSVM approach presented in this paper maintains labeling information in A for all patterns in the data set, including those that had their corresponding rows eliminated 0167-8655/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2009.07.022 * Corresponding author. Fax: +55 3132416175. E-mail addresses: bpenna@gmail.com (B.P.R. Carvalho), apbraga@cpdee.ufmg.br (A.P. Braga). Pattern Recognition Letters 30 (2009) 1507–1515 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec