Simultaneous feature selection and classification using kernel-penalized support vector machines Sebastián Maldonado, Richard Weber , Jayanta Basak 1 Department of Industrial Engineering, University of Chile, República 701, Santiago de Chile, Chile IBM India Research Lab, New Delhi, India article info Article history: Received 17 November 2009 Received in revised form 14 July 2010 Accepted 31 August 2010 Keywords: Feature selection Embedded methods Support vector machines Mathematical programming abstract We introduce an embedded method that simultaneously selects relevant features during classifier construction by penalizing each feature’s use in the dual formulation of support vector machines (SVM). This approach called kernel-penalized SVM (KP-SVM) optimizes the shape of an anisotropic RBF Kernel eliminating features that have low relevance for the classifier. Additionally, KP-SVM employs an explicit stopping condition, avoiding the elimination of features that would negatively affect the classifier’s performance. We per- formed experiments on four real-world benchmark problems comparing our approach with well-known feature selection techniques. KP-SVM outperformed the alternative approaches and determined consistently fewer relevant features. Ó 2010 Elsevier Inc. All rights reserved. 1. Introduction Classification is one of the most important data mining tasks. The performance of the respective models depends on – among other elements – an appropriate selection of the most relevant features which is a combinatorial problem in the num- ber of original features and offers the following advantages [1]: A low-dimensional representation reduces the risk of overfitting [5,10]. Using fewer features decreases the model’s complexity which improves its generalization ability. A low-dimensional representation requires less computational effort. Among existing classification methods, support vector machines (SVMs) provides several advantages such as adequate generalization to new objects, absence of local minima, and representation that depends on only a few parameters [21]. However, this method in standard formulation does not determine the importance of the features used [10] and is therefore not suitable for feature selection. This fact has motivated the development of several approaches for feature selection using SVMs (see e.g. [7]). Those methods generally work as filters selecting features from a high-dimensional feature space prior to designing the subsequent classifier. They provide feature ranking but without considering the combination of variables that optimizes classification performance. In this paper a novel embedded method for feature selection using SVM for classification problems is intro- duced. This method, called kernel-penalized SVM (KP-SVM), determines simultaneously a classifier with high classification 0020-0255/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2010.08.047 Corresponding author at: Department of Industrial Engineering, University of Chile, República 701, Santiago de Chile, Chile. Tel.: +56 2 9784072; fax: +56 2 678 7895. E-mail addresses: semaldon@ing.uchile.cl (S. Maldonado), rweber@dii.uchile.cl (R. Weber), basakjayanta@yahoo.com (J. Basak). 1 The author is presently affiliated with NetApp Bangalore India, Advanced Technology Group. Information Sciences 181 (2011) 115–128 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins