Multi-class support vector machine optimized by inter-cluster distance and self-adaptive deferential evolution Xiaoyuan Zhang , Jianzhong Zhou ⇑ , Changqin Wang, Chaoshun Li, Lixiang Song College of Hydropower and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, PR China article info Keywords: Support vector machine Parameter optimization Inter-cluster distance Differential evolution Fault diagnosis Rolling element bearings abstract Support vector machine (SVM) is a popular tool for machine learning task. It has been suc- cessfully applied in many fields, but the parameter optimization for SVM is an ongoing research issue. In this paper, to tune the parameters of SVM, one form of inter-cluster dis- tance in the feature space is calculated for all the SVM classifiers of multi-class problems. Inter-cluster distance in the feature space shows the degree the classes are separated. A lar- ger inter-cluster distance value implies a pair of more separated classes. For each classifier, the optimal kernel parameter which results in the largest inter-cluster distance is found. Then, a new continuous search interval of kernel parameter which covers the optimal ker- nel parameter of each class pair is determined. Self-adaptive differential evolution algo- rithm is used to search the optimal parameter combination in the continuous intervals of kernel parameter and penalty parameter. At last, the proposed method is applied to sev- eral real word datasets as well as fault diagnosis for rolling element bearings. The results show that it is both effective and computationally efficient for parameter optimization of multi-class SVM. Ó 2011 Elsevier Inc. All rights reserved. 1. Introduction Support vector machine (SVM) is based on the structural risk minimization (SRM) principle [1], which makes it less prone to over-fitting. By maximizing the margin between two opposite classes, SVM can find the optimal separating hyper-plane that minimizes the upper bound of the generalization error, which enables SVM to have strong capability of fitting and gen- eralization. By introducing the kernel tricks, SVM has the ability of dealing with infinite or nonlinear features in a high dimensional feature space. With the above attractive features, SVM is regarded as state-of-the-art classifier. It is generally acknowledged that SVM has a good performance in solving nonlinear and high dimensional pattern recognition problems with good generalization ability. Although SVM has so many advantages and has been successfully applied in many fields, such as biomedicine [2,3], text categorization [4,5] fault diagnosis [6,7] and so on, in practice, its parameters, the kernel parameters (for instance, width parameter g of RBF kernel function) and penalty parameter C, must be selected judiciously so that the performance of SVM can be brought into full play. Changing the kernel parameters is equivalent to selecting the feature spaces, and tuning C is corresponding to weighting the slack variables, the error terms. Consequently, the perfor- mance of SVM depends on its parameters largely. However, there is no systematic methodology or priori knowledge for determining the parameters of SVM. A wide range of studies have been carried out on this topic. A simple and straightforward way is grid search (GS) [8]. This procedure requires a grid search over the parameter space. It trains SVMs with all desired combinations of parameters and 0096-3003/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2011.10.063 ⇑ Corresponding author. E-mail address: jz.zhou@mail.hust.edu.cn (J. Zhou). Applied Mathematics and Computation 218 (2012) 4973–4987 Contents lists available at SciVerse ScienceDirect Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc