Journal of Computer Networks, 2017, Vol. 4, No. 1, 48-55
Available online at http://pubs.sciepub.com/jcn/4/1/5
©Science and Education Publishing
DOI:10.12691/jcn-4-1-5
Big Data in Intrusion Detection Systems and Intrusion
Prevention Systems
Lidong Wang
*
Department of Engineering Technology, Mississippi Valley State University, Itta Bena, MS, USA
*Corresponding author: lwang22@students.tntech.edu
Abstract This paper introduces network attacks, intrusion detection systems, intrusion prevention systems, and
intrusion detection methods including signature-based detection and anomaly-based detection. Intrusion
detection/prevention system (ID/PS) methods are compared. Some data mining and machine learning methods and
their applications in intrusion detection are introduced. Big data in intrusion detection systems and Big Data
analytics for huge volume of data, heterogeneous features, and real-time stream processing are presented. Challenges
of intrusion detection systems and challenges posed by stream processing of big data in the systems are also
discussed.
Keywords: big data, intrusion detection system (IDS), intrusion prevention system (IPS), signature-based
detection, anomaly-based detection, data mining, machine learning, network security
Cite This Article: Lidong Wang, “Big Data in Intrusion Detection Systems and Intrusion Prevention
Systems.” Journal of Computer Networks, vol. 4, no. 1 (2017): 48-55. doi: 10.12691/jcn-4-1-5.
1. Introduction
Many classes and applications of cybercrime and
terrorism contain a misrepresentation of identity or an
attempt to authenticate for access to a business or services
for which attackers have no legitimate use. Within the
European Union, the eIDentity, Authentication & Signatures
Regulation were launched in October 2014. The initial
results of the European project CAMINO in terms of the
realistic roadmap to counter cybercrime and cyber terrorism
were presented. The primary target for the CAMINO
project was to provide a realistic roadmap for improving
resilience against cyber terrorism and cybercrime [1]. An
intrusion detection system (IDS) is often regarded as a
second-line security solution after authentication, firewall,
cryptography, and authorization techniques, etc. which are
first line security measures [2]. An IDS is software that
automates the intrusion detection process. An intrusion
prevention system (IPS) is software that has all the
capabilities of an IDS and can also attempt to stop
possible incidents. IDS and IPS technologies can offer
many of the same capabilities, but administrators can also
disable prevention features in IPS products, letting them
function as IDSs. Many intrusion detection and prevention
systems (IDPS) can also respond to a detected threat and
use several response techniques: during which the IDPS
can stop the attack itself, change the attack’s content, or
change the security environment (e.g., reconfiguring a
firewall) [3].
An IDS can monitor specific protocols like the Hyper
Text Transfer Protocol (HTTP) of a web server. This type
of IDS is called a protocol-based intrusion detection
system (PIDS). IDSs can also be specialized to monitor
application-specific protocols like an application protocol-
based intrusion detection system (APIDS). An example of
this is an APIDS, which monitors the database’s Structured
Query Language (SQL) protocol. Like the heterogeneity
of the security event sources such as network and diverse
host types, the IDSs themselves can also be heterogeneous
in their types, how they operate, and their diverse alert-
output formats [4].
Four kinds of data can be gathered for correlation by a
developed IDS in security monitoring. They are: IP flow
records, HTTP packets, DNS replies, and Honeypot data.
For example, flow records provide invaluable data for
detecting intrusions or highlighting botnet communications.
Traces of every communication from the enterprise
network to the Internet and vis versa could be stored by
exporting NetFlow records from the core router of the
network. HTTP traffic is a well-known intrusion vector
and represents a significant portion of the traffic of Internet
users. Studying uniform resource identifiers (URIs)
embedded in HTTP packets and their payload help detect
and prevent malicious communications. Domain Name
System (DNS) requests are performed to get IP addresses
associated with a domain and consult the associated
resource. Therefore, monitoring the DNS to identify
malicious domains is efficient in proactively detecting and
preventing an important part of malicious communications.
A honeypot generally emulates vulnerable services and
contains fake production data. Logging honeypot
information helps obtain attackers’ data about targeting a
specific network such as protocols used, IP addresses used,
exploit file used, and scanning strategies, etc. [5].
There are three models of intrusion detection mechanisms:
signature-based, anomaly-based, and hybrid detection [6].
However, two approaches of attack identification are
usually used in an IDS: 1) signatures that are specific