Selection of Candidate Support Vectors in incremental SVM for network intrusion detection * Roshan Chitrakar * , Chuanhe Huang School of Computer, Wuhan University, Wuhan, Hubei, China article info Article history: Received 24 September 2013 Received in revised form 25 April 2014 Accepted 10 June 2014 Available online 19 June 2014 Keywords: Incremental support vector ma- chine KarusheKuhneTucker condition Candidate Support Vector Half-partition strategy Network intrusion detection abstract In an Incremental Support Vector Machine classification, the data objects labelled as non- support vectors by the previous classification are re-used as training data in the next classification along with new data samples verified by KarusheKuhneTucker (KKT) con- dition. This paper proposes Half-partition strategy of selecting and retaining non-support vectors of the current increment of classification e named as Candidate Support Vectors (CSV) e which are likely to become support vectors in the next increment of classification. This research work also designs an algorithm named the Candidate Support Vector based Incremental SVM (CSV-ISVM) algorithm that implements the proposed strategy and ma- terializes the whole process of incremental SVM classification. This work also suggests modifications to the previously proposed concentric-ring method and reserved set strat- egy. Performance of the proposed method is evaluated with experiments and also by comparing it with other ISVM techniques. Experimental results and performance analyses show that the proposed algorithm CSV-ISVM is better than general ISVM classifications for real-time network intrusion detection. © 2014 Elsevier Ltd. All rights reserved. 1. Introduction Network intrusion detection is also considered as a pattern recognition problem of classifying the network traffic pat- terns into two classes e normal and abnormal; according to the similarity between them. Nowadays, in the field of intrusion detection, Support Vector Machine (SVM) is becoming a popular classification tool based on statistical machine learning (Mohammad et al., 2011). There are two issues in machine learning e training of large-scale data sets and availability of a complete data set (Le and Nguyen, 2011; Du et al., 2009a,b). Computer's memory will not be enough and training time will be too long if training data set is very large. Next, when we capture data packets from a stream of a network, we cannot obtain the complete network infor- mation in the very first time and hence a continuous online learning is required for high learning precision with increasing number of samples. The challenge of incremental learning is to decide what and how much information from the previous learning should be selected for training in the * This work is supported by the National Science Foundation of China (No. 61373040, No. 61173137), The Ph.D. Programs Foundation of Ministry of Education of China (20120141110073), Key Project of Natural Science Foundation of Hubei Province (No. 2010CDA004). * Corresponding author. E-mail addresses: roshanchi@gmail.com, roshanchi@whu.edu.cn (R. Chitrakar), huangch@whu.edu.cn (C. Huang). Available online at www.sciencedirect.com ScienceDirect journal homepage: www.elsevier.com/locate/cose computers & security 45 (2014) 231 e241 http://dx.doi.org/10.1016/j.cose.2014.06.006 0167-4048/© 2014 Elsevier Ltd. All rights reserved.