Parameter determination of support vector machine and feature selection using simulated annealing approach Shih-Wei Lin a,b, * , Zne-Jung Lee b , Shih-Chieh Chen c , Tsung-Yuan Tseng b a Department of Information Management, Chang Gung University, No. 259 Wen-Hwa 1st Road, Kwei-Shan Tao-Yuan 333, Taiwan, ROC b Department of Information Management, Huafan University, No. 1 Huafan Road, Taipei, Taiwan, ROC c Department of Industrial Management, National Taiwan University of Science and Technology, No. 43 Keelung Road, Sec. 4, Taipei, Taiwan, ROC Received 31 January 2007; received in revised form 6 October 2007; accepted 21 October 2007 Available online 26 October 2007 Abstract Support vector machine (SVM) is a novel pattern classification method that is valuable in many applications. Kernel parameter setting in the SVM training process, along with the feature selection, significantly affects classification accuracy. The objective of this study is to obtain the better parameter values while also finding a subset of features that does not degrade the SVM classification accuracy. This study develops a simulated annealing (SA) approach for parameter determination and feature selection in the SVM, termed SA-SVM. To measure the proposed SA-SVM approach, several datasets in UCI machine learning repository are adopted to calculate the classification accuracy rate. The proposed approach was compared with grid search which is a conventional method of performing parameter setting, and various other methods. Experimental results indicate that the classification accuracy rates of the proposed approach exceed those of grid search and other approaches. The SA-SVM is thus useful for parameter determination and feature selection in the SVM. # 2007 Elsevier B.V. All rights reserved. Keywords: Support vector machines; Simulated annealing; Parameter determination; Feature selection 1. Introduction Classification problems have been extensively studied. Numerous factors, from incomplete data to the choice of parameter values for a given model, may influence classifica- tion outcomes. Classification problems have previously typically been tackled by statistical methods, such as logistic regression or discriminate analysis. Advances in technology have led to new techniques for solving classification problems, including decision trees, back-propagation neural networks, rough set theory and support vector machines (SVM). SVM is an emerging data classification technique first developed by Vapnik [1], and has been widely adopted in various fields of classification problems recently [2–9]. In the SVM, the model for classification is generated from the training stage using the sampling data. Classification is then performed based on the trained model. The biggest difficulties in setting up the SVM model are choosing the kernel function and its parameter values. If the parameter values are not set properly, then the classification outcomes will be less than optimal [10]. The bearing conditions are classified from the statistical features of both the original data and the data with some pre- processing, using differentiation and integration, low- and high- pass filtering, and spectral data of the database. In complex classification domains, some features may contain false correlations, which impede data processing. Moreover, some features may be redundant, since the information that they add is contained in other features. Redundant features can lengthen the computational time, influencing the classification accuracy. Hence, the classification process must be fast and accurate using the minimum number of features, which is a goal attainable through the use of feature selection. Feature selection has been applied to enhance classification performance, and to reduce data noise [11–13]. If the SVM is adopted without feature selection, then the dimension of the input space is large and non-clean, lowering www.elsevier.com/locate/asoc Available online at www.sciencedirect.com Applied Soft Computing 8 (2008) 1505–1512 * Corresponding author. Tel.: +886 3 2118800; fax: +886 3 2118020. E-mail addresses: swlin@mail2000.com.tw, swlin@cc.hfu.edu.tw (S.-W. Lin). 1568-4946/$ – see front matter # 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2007.10.012