International Journal of Computer Applications Technology and Research Volume 9–Issue 03, 115-124, 2019, ISSN:-2319–8656 www.ijcat.com 115 Model for Intrusion Detection Based on Hybrid Feature Selection Techniques Joseph Mbugua Chahira Department of Information and Computer Science Garissa University, Kenya Abstract In order to safeguard their critical systems against network intrusions, organisations deploys multiple Network Intrusion Detection System (NIDS) to detect malicious packets embedded in network traffic based on anomaly and misuse detection approaches. The existing NIDS deal with a huge amount of data that contains null values, incomplete information, and irrelevant features that affect the detection rate of the IDS, consumes high amount of system resources, and slowdown the training and testing process of the IDS. In this paper, a new feature selection model is proposed based on hybrid feature selection techniques (information gain, correlation, chi squere and gain ratio) and Principal Component Analysis (PCA) for feature reduction. This study employed data mining and machine learning techniques on NSL KDD dataset in order to explore significant features in detecting network intrusions. The experimental results showed that the proposed model improves the detection rates and also speed up the detection process. Key words cyber attacks, Intrusion detection, feature selection, data mining. Introduction Network Intrusion Detection System (IDS) [1] monitors the use of computers and networks over which they communicate , searching for unauthorised use, anamolous behaviour, and attempt to deny users, machines or portions of networks access to the services. Although the intrusion detection systems are increasingly deployed in the computer network, they deal with a huge amount of data that contains null values, incomplete information, and irrelevant features. The analysis of the large quantities of data can be tedious, time-consuming and error-prone. Data mining and machine learning[2] provides tools to select best relevance features subset which improves detection accuracy and removes distractions. Feature selection problem can be characterised in the context of machine learning [3][4], [5]. Assume that T = D(F,C) is a training dataset with m instances and n features, where D = o1, o2, . . . , om and F = f1, f2, . . . , fn are the sets of instances and features. C = c1, c2, . . . , ck refers to the set of class labels. For each instance oj ∈ D, it can be denoted as a value vector of features, i.e., oj = (vj1, vj2, . . . , vjn), vji is the value of oj corresponding to the feature fi.. Therefore, feature selection plays an important role in alert correlation through reduction in the amount of data needed to achieve learning, improved predictive accuracy, learned knowledge that is more compact and easily understood and reduced execution time. The existing feature selection techniques in machine learning can be broadly cassified into two categories i.e wrappers and filters. Wrappers selection techniques evaluate the worth of features using the learning algorithm applied to the data while filters evaluate the worth of features by using heuristics based on general characteristics of the data. Feature selection algorithms can be further differentiated by the exact nature of their evaluation function, and by how the space of feature subsets is explored. Wrappers often give better results in terms of the final predictive accuracy of a learning algorithm than filters because feature selection is optimized for the particular learning algorithm used. However, since a learning algorithm is employed to evaluate each and every set of features considered, wrappers are prohibitively expensive to run, and can be intractable for large databases containing many features. Furthermore, since the feature selection process is tightly coupled with a learning algorithm, wrappers are less general than filters and must be re-run when switching from one learning algorithm to another. The advantages of filter approaches in feature selection outweigh their disadvantages. Flters execute many times faster as compaired to wrappers and therefore applicable in databases with a large number of features [6]. They do not require re-execution for different learning algorithms and can provide an intelligent starting feature subset for a wrapper incase improved accuracy for a particular learning algorithm is required[7]. Filter algorithms also exhibited a number of drawbacks. Some algorithms do not handle noise in data, and others require that the level of noise be roughly specified by the user a-priori [3], [7]. In some cases, a subset of features is not selected explicitly; instead, features are ranked with the