Candidate working set strategy based SMO algorithm in support vector machine Xiao-feng Song a, * , Wei-min Chen a , Yi-Ping Phoebe Chen b, * , Bin Jiang a a College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, PR China b School of Information Technology, Deakin University, Australia article info Article history: Received 11 February 2009 Received in revised form 5 May 2009 Accepted 5 May 2009 Available online 2 June 2009 Keywords: Support vector machine SMO Candidate working set strategy Kernel cache abstract Sequential minimal optimization (SMO) is quite an efﬁcient algorithm for training the sup- port vector machine. The most important step of this algorithm is the selection of the working set, which greatly affects the training speed. The feasible direction strategy for the working set selection can decrease the objective function, however, may augment to the total calculation for selecting the working set in each of the iteration. In this paper, a new candidate working set (CWS) Strategy is presented considering the cost on the work- ing set selection and cache performance. This new strategy can select several greatest vio- lating samples from Cache as the iterative working sets for the next several optimizing steps, which can improve the efﬁciency of the kernel cache usage and reduce the compu- tational cost related to the working set selection. The results of the theory analysis and experiments demonstrate that the proposed method can reduce the training time, espe- cially on the large-scale datasets. Ó 2009 Elsevier Ltd. All rights reserved. 1. Introduction Support vector machine (SVM), which was developed from the statistic learning theory (Cortes & Vapnik, 1995; Vapnik, 1995), has become one of the most promising learning algorithms in management’s ﬁelds including text classiﬁcation, infor- mation retrieval, business model construction, customer analysis and so on (Ko & Seo, 2009; Tseng, Lin, & Lin, 2007). How to effectively improve SVMs performance is still an ongoing research issue. There are many approaches proposed lately, such as chunking, decomposition (Osuna, Freund, & Girosi, 1997), sequential minimal optimization (SMO) (Platt, 1999) and Kernel- AdaTron (Campbell & Cristianini, 1998; Principe, Euliano, & Lefebvre, 1999; Ramy, Saman, & li, 2008). They decompose a large optimization problem into lots of small-scale ones. SVMlight (Joachims, 1998a, 1998b) and Libsvm (Chen, Fan, & Lin, 2006; Hsu & Lin, 2002; Lin, 2001; Chang & Lin, 2001), the most popular training software, adopting the decomposition or SMO, utilize the steepest feasible descent approach for the working set selection, and introduce the kernel caching and shrinking strategies to accelerate the training speed for SVM. But it costs much time in working set selection for decompo- sition and SMO. In each of the iterations for optimization, the two approaches should update all the training set and select the current working set, which contribute to minimize the objective function in the current step. In the iterating process of training SVM, the samples, that violate the Karush-Kuhn-Tucker condition (KKT) most, could contribute to descending the objective function. For example, the maximal violating pair is selected as the working set for SMO in Libsvm. In general, there are several such samples, not only one or two. We call these samples violating KKT con- dition greatly as great violating samples. So we can select all these samples before optimization to prepare the working sets 0306-4573/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.ipm.2009.05.002 * Corresponding authors. E-mail addresses: xfsong@nuaa.edu.cn (X.-f. Song), phoebe@deakin.edu.au (Y.-P. P. Chen). Information Processing and Management 45 (2009) 584–592 Contents lists available at ScienceDirect Information Processing and Management journal homepage: www.elsevier.com/locate/infoproman