International Journal of Computer Applications (0975 – 8887) Volume 78 – No.4, September 2013 21 Feature Ranking in Intrusion Detection Dataset using Combination of Filtering Methods Zahra Karimi Islamic Azad University Tehran North Branch Dept. of Computer Engineering Tehran, Iran Mohammad Mansour Riahi Kashani Islamic Azad University Tehran North Branch Dept. of Computer Engineering Tehran, Iran Ali Harounabadi Islamic Azad University Central Tehran Branch Dept. of Computer Engineering Tehran, Iran ABSTRACT Intrusion detection is a crucial part for security of information systems. Most intrusion detection systems use all features in their databases while some of these features may be irrelevant or redundant and they do not contribute to the process of intrusion detection. Therefore, different feature ranking and feature selection techniques are proposed. In this paper, hybrid feature selection methods are used to select and rank reliable features and eliminate irrelevant and useless features to have a more accurate and reliable intrusion detection process. Due to the low cost and low accuracy of filtering methods, a combination of these methods could possibly improve their accuracy by a reasonable cost and create a balance between them. In the first phase, two subsets of reliable features are created by application of information gain and symmetrical uncertainty filtering methods. In the second phase, the two subsets are merged, weighted and ranked to extract the most important features. This feature ranking which is done by the combination of two filtering methods, leads to higher the accuracy of intrusion detection. KDD99 standard dataset for intrusion detection is used for experiments. The better detection rate obtained in proposed method is shown by comparing it with other feature selection methods that are applied on the same dataset. Keywords Intrusion Detection, Feature Selection, Filtering, KDD99 Dataset 1. INTRODUCTION Feature selection is a pre-processing technique that finds a minimum subset of features that captures the relevant properties of a dataset to enable adequate classification [1]. Given that no loss of relevant information is incurred with a reduction in the original feature space, feature selection has been widely used. Feature selection has been considered in many classification problems [2], and it has been used in various application domains [3], [4]. Feature selection techniques are very useful for improving the performance of learning algorithms [5]. For this reason, the strengths and weaknesses of feature selection techniques are traditionally assessed in terms of the classification performance from models built with a subset of the original features. Therefore, the hybrid feature selection method was used in this paper to select and rank reliable features and eliminate irrelevant and useless features to have a more accurate and reliable intrusion detection process and to eliminate the possibility that a single feature selection approach will result to some biased results. Therefore, combining them is a good and reasonable choice. In the first phase, two subsets of reliable features are created by application of information gain and symmetrical uncertainty filtering methods. In the second phase, the two subsets are merged, weighted and ranked to extract the most important features. This feature ranking which is done by the combination of two filtering methods, leads to higher the accuracy of intrusion detection. The definition of Feature selection, Filtering Methods, Intrusion detection, and Naïve Bayes classifier which are used in proposed method, is presented in section 2, 3, 4 and 5. In section 6, proposed method and the phases involved in the feature selection process is described. In section 7, the performance of the proposed method is tested on KDD99 dataset. Conclusions are given in section8. 2. FEATURE SELECTION In order to make IDS more efficient, reducing the data dimensions and complexity have been used as simplifying features. Feature selection can reduce both the data and the computational complexity. It can also get more efficient and find out the useful feature subsets. It is the process of choosing a subset of original features so that the feature space is optimally reduced to evaluation criterion. The raw data collected is usually large, so it is desired to select a subset of data by creating feature vectors that Feature subset selection is the process of identifying and removing much of the redundant and irrelevant information possible. This results in the reduction of dimensionality of the data and thereby makes the learning algorithms run in a faster and more efficient manner. The feature selection techniques are generally mainly divided into two categories, filter and wrapper [6]. Filter method operates without engaging any information of induction algorithm. By using some prior knowledge such as feature should have strong correlation with the target class or feature should be uncorrelated to each other, filter method selects the best subset of features Alternatively, wrapper method employs a predetermined induction algorithm to find a subset of features with the highest evaluation by searching through the space of feature subsets and evaluating quality of selected features. The process of feature selection acts like “wrapped around” an induction algorithm since wrapper approach includes a specific induction algorithm to optimize feature selection; it often provides a better classification accuracy result than that of filter approach. However, wrapper method is more time consuming than filter method due to it is strongly coupled with an induction algorithm with repeatedly calling the algorithm to evaluate the performance of each subset of features. It thus becomes unpractical to apply a wrapper method to select features from a large data set that contains