Parameter determination of support vector machine and feature selection using simulated annealing approach Shih-Wei Lin a,b, * , Zne-Jung Lee b , Shih-Chieh Chen c , Tsung-Yuan Tseng b a Department of Information Management, Chang Gung University, No. 259 Wen-Hwa 1st Road, Kwei-Shan Tao-Yuan 333, Taiwan, ROC b Department of Information Management, Huafan University, No. 1 Huafan Road, Taipei, Taiwan, ROC c Department of Industrial Management, National Taiwan University of Science and Technology, No. 43 Keelung Road, Sec. 4, Taipei, Taiwan, ROC Received 31 January 2007; received in revised form 6 October 2007; accepted 21 October 2007 Available online 26 October 2007 Abstract Support vector machine (SVM) is a novel pattern classiﬁcation method that is valuable in many applications. Kernel parameter setting in the SVM training process, along with the feature selection, signiﬁcantly affects classiﬁcation accuracy. The objective of this study is to obtain the better parameter values while also ﬁnding a subset of features that does not degrade the SVM classiﬁcation accuracy. This study develops a simulated annealing (SA) approach for parameter determination and feature selection in the SVM, termed SA-SVM. To measure the proposed SA-SVM approach, several datasets in UCI machine learning repository are adopted to calculate the classiﬁcation accuracy rate. The proposed approach was compared with grid search which is a conventional method of performing parameter setting, and various other methods. Experimental results indicate that the classiﬁcation accuracy rates of the proposed approach exceed those of grid search and other approaches. The SA-SVM is thus useful for parameter determination and feature selection in the SVM. # 2007 Elsevier B.V. All rights reserved. Keywords: Support vector machines; Simulated annealing; Parameter determination; Feature selection 1. Introduction Classiﬁcation problems have been extensively studied. Numerous factors, from incomplete data to the choice of parameter values for a given model, may inﬂuence classiﬁca- tion outcomes. Classiﬁcation problems have previously typically been tackled by statistical methods, such as logistic regression or discriminate analysis. Advances in technology have led to new techniques for solving classiﬁcation problems, including decision trees, back-propagation neural networks, rough set theory and support vector machines (SVM). SVM is an emerging data classiﬁcation technique ﬁrst developed by Vapnik [1], and has been widely adopted in various ﬁelds of classiﬁcation problems recently [2–9]. In the SVM, the model for classiﬁcation is generated from the training stage using the sampling data. Classiﬁcation is then performed based on the trained model. The biggest difﬁculties in setting up the SVM model are choosing the kernel function and its parameter values. If the parameter values are not set properly, then the classiﬁcation outcomes will be less than optimal [10]. The bearing conditions are classiﬁed from the statistical features of both the original data and the data with some pre- processing, using differentiation and integration, low- and high- pass ﬁltering, and spectral data of the database. In complex classiﬁcation domains, some features may contain false correlations, which impede data processing. Moreover, some features may be redundant, since the information that they add is contained in other features. Redundant features can lengthen the computational time, inﬂuencing the classiﬁcation accuracy. Hence, the classiﬁcation process must be fast and accurate using the minimum number of features, which is a goal attainable through the use of feature selection. Feature selection has been applied to enhance classiﬁcation performance, and to reduce data noise [11–13]. If the SVM is adopted without feature selection, then the dimension of the input space is large and non-clean, lowering www.elsevier.com/locate/asoc Available online at www.sciencedirect.com Applied Soft Computing 8 (2008) 1505–1512 * Corresponding author. Tel.: +886 3 2118800; fax: +886 3 2118020. E-mail addresses: swlin@mail2000.com.tw, swlin@cc.hfu.edu.tw (S.-W. Lin). 1568-4946/$ – see front matter # 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2007.10.012