Selection of Candidate Support Vectors in incremental SVM for network intrusion detection * Roshan Chitrakar * , Chuanhe Huang School of Computer, Wuhan University, Wuhan, Hubei, China article info Article history: Received 24 September 2013 Received in revised form 25 April 2014 Accepted 10 June 2014 Available online 19 June 2014 Keywords: Incremental support vector ma- chine KarusheKuhneTucker condition Candidate Support Vector Half-partition strategy Network intrusion detection abstract In an Incremental Support Vector Machine classiﬁcation, the data objects labelled as non- support vectors by the previous classiﬁcation are re-used as training data in the next classiﬁcation along with new data samples veriﬁed by KarusheKuhneTucker (KKT) con- dition. This paper proposes Half-partition strategy of selecting and retaining non-support vectors of the current increment of classiﬁcation e named as Candidate Support Vectors (CSV) e which are likely to become support vectors in the next increment of classiﬁcation. This research work also designs an algorithm named the Candidate Support Vector based Incremental SVM (CSV-ISVM) algorithm that implements the proposed strategy and ma- terializes the whole process of incremental SVM classiﬁcation. This work also suggests modiﬁcations to the previously proposed concentric-ring method and reserved set strat- egy. Performance of the proposed method is evaluated with experiments and also by comparing it with other ISVM techniques. Experimental results and performance analyses show that the proposed algorithm CSV-ISVM is better than general ISVM classiﬁcations for real-time network intrusion detection. © 2014 Elsevier Ltd. All rights reserved. 1. Introduction Network intrusion detection is also considered as a pattern recognition problem of classifying the network trafﬁc pat- terns into two classes e normal and abnormal; according to the similarity between them. Nowadays, in the ﬁeld of intrusion detection, Support Vector Machine (SVM) is becoming a popular classiﬁcation tool based on statistical machine learning (Mohammad et al., 2011). There are two issues in machine learning e training of large-scale data sets and availability of a complete data set (Le and Nguyen, 2011; Du et al., 2009a,b). Computer's memory will not be enough and training time will be too long if training data set is very large. Next, when we capture data packets from a stream of a network, we cannot obtain the complete network infor- mation in the very ﬁrst time and hence a continuous online learning is required for high learning precision with increasing number of samples. The challenge of incremental learning is to decide what and how much information from the previous learning should be selected for training in the * This work is supported by the National Science Foundation of China (No. 61373040, No. 61173137), The Ph.D. Programs Foundation of Ministry of Education of China (20120141110073), Key Project of Natural Science Foundation of Hubei Province (No. 2010CDA004). * Corresponding author. E-mail addresses: roshanchi@gmail.com, roshanchi@whu.edu.cn (R. Chitrakar), huangch@whu.edu.cn (C. Huang). Available online at www.sciencedirect.com ScienceDirect journal homepage: www.elsevier.com/locate/cose computers & security 45 (2014) 231 e241 http://dx.doi.org/10.1016/j.cose.2014.06.006 0167-4048/© 2014 Elsevier Ltd. All rights reserved.