Building lightweight intrusion detection system using wrapper-based feature selection mechanisms Yang Li a,d, *, Jun-Li Wang b , Zhi-Hong Tian d , Tian-Bo Lu c , Chen Young c a China Mobile Research Institute, Beijing 100053, China b Peking University Founder Technology College, Beijing 065001, China c National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China d Chinese Academy of Sciences, Beijing 100190, China article info Article history: Received 8 January 2008 Received in revised form 11 November 2008 Accepted 7 January 2009 Keywords: Network security Intrusion detection system Feature selection Modified RMHC Modified linear SVMs abstract Intrusion Detection System (IDS) is an important and necessary component in ensuring network security and protecting network resources and network infrastructures. How to build a lightweight IDS is a hot topic in network security. Moreover, feature selection is a classic research topic in data mining and it has attracted much interest from researchers in many fields such as network security, pattern recognition and data mining. In this paper, we effectively introduced feature selection methods to intrusion detection domain. We propose a wrapper-based feature selection algorithm aiming at building lightweight intrusion detection system by using modified random mutation hill climbing (RMHC) as search strategy to specify a candidate subset for evaluation, as well as using modified linear Support Vector Machines (SVMs) iterative procedure as wrapper approach to obtain the optimum feature subset. We verify the effectiveness and the feasibility of our feature selection algorithm by several experiments on KDD Cup 1999 intrusion detection dataset. The experimental results strongly show that our approach is not only able to speed up the process of selecting important features but also to yield high detection rates. Furthermore, our experimental results indicate that intrusion detection system with feature selection algorithm has better performance than that without feature selection algorithm both in detection performance and computational cost. ª 2009 Elsevier Ltd. All rights reserved. 1. Introduction Intrusion detection system (IDS) plays a vital role in detecting various kinds of attacks and it is a valuable tool for the defense-in-depth of computer networks. Network-based IDS looks for known or potential malicious activities in network traffic and raise an alarm whenever a suspicious activity is detected. In general, IDS deals with huge amount of data which contains irrelevant and redundant features causing slow training and testing process, higher resource consumption as well as poor detection rate. Feature selection is one of the key topics in IDS. For example, in many pattern classification tasks we are confronted with the problem that we have a very high dimensional feature space. Some of these features may be irrelevant or redundant. Removing these irrelevant or * Corresponding author. China Mobile Research Institute, Unit 2, 28 Xuanwumenxi Ave., Xuanwu District, Beijing 100053, China. Tel./fax: þ86 10 66006688. E-mail address: samsunglinux@163.com (Y. Li). available at www.sciencedirect.com journal homepage: www.elsevier.com/locate/cose 0167-4048/$ – see front matter ª 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.cose.2009.01.001 computers & security 28 (2009) 466–475