I. J. Computer Network and Information Security, 2019, 4, 43-52
Published Online April 2019 in MECS (http://www.mecs-press.org/)
DOI: 10.5815/ijcnis.2019.04.06
Copyright © 2019 MECS I.J. Computer Network and Information Security, 2019, 4, 43-52
Intrusion Detection using Machine Learning and
Feature Selection
Prachi, Heena Malhotra
The NorthCap University, Gurgaon, India
E-mail: {prachiah1985, malhotraheena17}@gmail.com
Prabha Sharma
The NorthCap University, Gurgaon, India
E-mail: prabhasharma@ncuinda.edu
Received: 13 February 2019; Accepted: 27 February 2019; Published: 08 April 2019
Abstract—Intrusion Detection is one of the most
common approaches used in detecting malicious
activities in any network by analyzing its traffic. Machine
Learning (ML) algorithms help to study the high
dimensional network traffic and identify abnormal flow
in traffic with high accuracy. It is crucial to integrate
machine learning algorithms with dimensionality
reduction to decrease the underlying complexity of
processing of huge datasets and detect intrusions within
real-time. This paper evaluates 10 most popular ML
algorithms on NSL-KDD dataset. Thereafter, the ranking
of these algorithms is done to identify best performing
ML algorithm on the basis of their performance on
several parameters such as specificity, sensitivity,
accuracy etc. After analyzing the top 4 algorithms, it
becomes evident that they consume a lot of time while
model building. Therefore, feature selection is applied to
detect intrusions in as little time as possible without
compromising accuracy. Experimental results clearly
demonstrate that which algorithm works best
with/without feature selection/reduction technique in
terms of achieving high accuracy while minimizing the
time taken in building the model.
Index Terms—Network, Intrusion, Machine Learning,
NSL-KDD Dataset, Feature Selection.
I. INTRODUCTION
Huge technological advancements in the field of
communication industry massively increased the volume
of data and its transmission across the globe via the
internet. However, such advancements put valuable
information and data at risk [1]. In today’s era, intrusion
happens within a few seconds. This gives rise to the need
for a stronger security system. An Intrusion Detection
System (IDS) [2] analyzes the network traffic to identify
malicious actions. Currently available IDS are divided in
2 major categories [3], namely, anomaly and misused
based detection. Misuse detection identifies an intrusion
on the basis of already known patterns, popularly called
as signatures. Therefore, misuse detection is also referred
as signature-based IDS (e.g. Snort [4]). Anomaly
detection [5] identifies any unacceptable deviation from
normal traffic. Unlike signature-based IDS, anomaly
detection identifies zero-day attacks but generates a large
number of false alarms. It also faces many challenges
while dealing with huge amount of high-dimensional data.
In order to analyze huge volumes of data, most of the
existing IDS use Machine Learning (ML) algorithms to
identify intrusions in an efficient manner. Although many
techniques are available for detection purposes, quite a
few are effective in producing high accuracy and low
false positives for a huge amount of data [6]. Also, some
ML algorithms perform better than others in terms of
accuracy but take more training time for building models
on large datasets. Hence, this results in an imminent need
of consolidating ML algorithms with feature
selection/reduction to obtain an accurate classification of
reduced dimensional data while taking lesser time in
building the model.
An ideal IDS should be able to spot zero-day attacks
with high accuracy and low false positives quickly so that
intrusions can be prevented as early as possible [7].
Consequently, the Objective behind this paper is to
design an intrusion detection model that integrates ML
algorithms with the feature selection and feature
reduction methods to detect intrusions with high accuracy
and low false positives within a short span of time.
This paper evaluates the performance of 10 most
popular ML algorithms in WEKA [8] using NSL-KDD
dataset [9]. Thereafter, algorithms are ranked based on
their performances on certain parameters such as
specificity, sensitivity, accuracy, the time taken in
building the model, etc. To achieve high accuracy, less
false alarms and minimum training time on large data sets,
this paper applies dimensionality reduction methods on
the best 4 ML algorithms. Later on, the performance of
these best 4 ML algorithms is evaluated with/without
applying feature selection/reduction methods in order to
build an ideal model for intrusion detection.
The organization of this paper is as follows. Work
related to intrusion detection is highlighted in Section II.