International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8 Issue-9, July, 2019 2622 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number: I8982078919/19©BEIESP DOI:10.35940/ijitee.I8982.078919 Abstract— As the new technologies are emerging, data is getting generated in larger volumes high dimensions. The high dimensionality of data may rise to great challenge while classification. The presence of redundant features and noisy data degrades the performance of the model. So, it is necessary to extract the relevant features from given data set. Feature extraction is an important step in many machine learning algorithms. Many researchers have been attempted to extract the features. Among these different feature extraction methods, mutual information is widely used feature selection method because of its good quality of quantifying dependency among the features in classification problems. To cope with this issue, in this paper we proposed simplified mutual information based feature selection with less computational overhead. The selected feature subset is experimented with multilayered perceptron on KDD CUP 99 data set with 2- class classification, 5-class classification and 4-class classification. The accuracy is of these models almost similar with less number of features. Keywords — IDS, Perceptron, Mutual Information, Entropy, Conditional Entropy, Feature Selection. 1. INTRODUCTION Intrusion detection system [3, 19, and 20] dynamically detects and monitors the activities that occur in the network and analyzes the malicious activity which violates the security policy and user security. Intrusion detection is categorized into misuse and anomaly detection. In misuse detection the incoming and outgoing packet signatures are compared against a database of signatures. Anomaly detection creates a profile for the normal behavior and any activity that deviates from the profile is considered as an attack. There is continuous growth in attacks on the network from the last three decades. The attacks are impacted lot on the user security. It is difficult to handle these attacks traditionally. To handle these attacks automatically, lot of research [1-10] is carried on intrusion detection system using machine learning. Machine learning algorithm requires the past data to train the model. The IDS using machine learning is built on the standard data sets like KDD CUP 99, NSL-KDD, Kyoto-2006+, ISCX, etc. The KDD CUP 99 data set is the most popular and standard data set used in the literature. The data is collected and distributed by MIT Lincoln laboratory and is sponsored by the Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory (AFRL). The KDD CUP 98 and KDD CUP 99 data sets are a subset of DARPA sponsored project. The [23] KDD CUP 99 data set contains 41 features Revised Manuscript Received on July 10, 2019. V Maheshwar Reddy, Assistant Professor ACE Engineering College, Telangana, India. I Ravi Prakash Reddy, Professor, G. Narayanamma Institute of Technology and Science, Telangana, India. K Adi Narayana Reddy, Professor, ACE Engineering College, Telangana, India. and a class label. The class label is multi-class and it has five classes namely Normal, DOS, Probe, R2L and U2R. Feature selection [1, 2, 6, 7, 12, 13] is an important technique in selecting the subset of important features from the high dimensional data. This technique extracts relevant features and removes the redundant features. The feature selection approaches are categorized into filter based and wrapper based techniques. The wrapper based technique is dependent on classification algorithm whereas filter based technique extracts the subset of features, independent of classification algorithm. Most of the researchers developed the IDS models using machine learning algorithms with the different feature selection technique combinations. The ranking methodology and SVM are used in [21] as feature selection and classification algorithm. Similarly GA and decision tree algorithm in [22], PCA and SVM algorithm in [4], GA and SVM algorithm in [2] and rough set theory and SVM with different kernel functions in [14] are used as feature selection and classification algorithms. The feature selections techniques like correlation based feature selection, consistency based filter and INTERACT are introduced in [3]. The naïve Bayes, tree augmented naïve Bayes and NBTree are trained on the selected subset of features. The relevant features are selected using BIRCH hierarchical clustering algorithm in [6] and in [5] bagging with REPTree is trained on these selected features. These feature selection techniques are wrapper based and works with only the specific classification algorithm. We propose a feature selection technique based on mutual information. This technique is a filter based feature selection technique. In the next section we cover the literature on mutual information based filtering technique. The paper is organized as, section two deals with concepts of entropy, joint entropy, conditional entropy, and mutual information along literature survey on mutual information. Section 3 covers the proposed simplified mutual information based feature selection. Section 4 deals with experimental setup and results and the final section concludes the paper. 2. MUTUAL INFORMATION Mutual Information is originally proposed by Claude E. Shannon [15, 16] in the year 1948 in his research paper “A Mathematical Theory of Communications.” Entropy and conditional entropy are the smallest units of mutual information. Mutual information measures the dependency between two variables. The entropy measures the uncertainty of a random variable. The entropy of a random Intrusion Detection System using SMIFS and Multi class Multi layer Perceptron V Maheshwar Reddy, I Ravi Prakash Reddy, K Adi Narayana Reddy