Journal of Computer Networks, 2017, Vol. 4, No. 1, 48-55 Available online at http://pubs.sciepub.com/jcn/4/1/5 ©Science and Education Publishing DOI:10.12691/jcn-4-1-5 Big Data in Intrusion Detection Systems and Intrusion Prevention Systems Lidong Wang * Department of Engineering Technology, Mississippi Valley State University, Itta Bena, MS, USA *Corresponding author: lwang22@students.tntech.edu Abstract This paper introduces network attacks, intrusion detection systems, intrusion prevention systems, and intrusion detection methods including signature-based detection and anomaly-based detection. Intrusion detection/prevention system (ID/PS) methods are compared. Some data mining and machine learning methods and their applications in intrusion detection are introduced. Big data in intrusion detection systems and Big Data analytics for huge volume of data, heterogeneous features, and real-time stream processing are presented. Challenges of intrusion detection systems and challenges posed by stream processing of big data in the systems are also discussed. Keywords: big data, intrusion detection system (IDS), intrusion prevention system (IPS), signature-based detection, anomaly-based detection, data mining, machine learning, network security Cite This Article: Lidong Wang, “Big Data in Intrusion Detection Systems and Intrusion Prevention Systems.” Journal of Computer Networks, vol. 4, no. 1 (2017): 48-55. doi: 10.12691/jcn-4-1-5. 1. Introduction Many classes and applications of cybercrime and terrorism contain a misrepresentation of identity or an attempt to authenticate for access to a business or services for which attackers have no legitimate use. Within the European Union, the eIDentity, Authentication & Signatures Regulation were launched in October 2014. The initial results of the European project CAMINO in terms of the realistic roadmap to counter cybercrime and cyber terrorism were presented. The primary target for the CAMINO project was to provide a realistic roadmap for improving resilience against cyber terrorism and cybercrime [1]. An intrusion detection system (IDS) is often regarded as a second-line security solution after authentication, firewall, cryptography, and authorization techniques, etc. which are first line security measures [2]. An IDS is software that automates the intrusion detection process. An intrusion prevention system (IPS) is software that has all the capabilities of an IDS and can also attempt to stop possible incidents. IDS and IPS technologies can offer many of the same capabilities, but administrators can also disable prevention features in IPS products, letting them function as IDSs. Many intrusion detection and prevention systems (IDPS) can also respond to a detected threat and use several response techniques: during which the IDPS can stop the attack itself, change the attack’s content, or change the security environment (e.g., reconfiguring a firewall) [3]. An IDS can monitor specific protocols like the Hyper Text Transfer Protocol (HTTP) of a web server. This type of IDS is called a protocol-based intrusion detection system (PIDS). IDSs can also be specialized to monitor application-specific protocols like an application protocol- based intrusion detection system (APIDS). An example of this is an APIDS, which monitors the database’s Structured Query Language (SQL) protocol. Like the heterogeneity of the security event sources such as network and diverse host types, the IDSs themselves can also be heterogeneous in their types, how they operate, and their diverse alert- output formats [4]. Four kinds of data can be gathered for correlation by a developed IDS in security monitoring. They are: IP flow records, HTTP packets, DNS replies, and Honeypot data. For example, flow records provide invaluable data for detecting intrusions or highlighting botnet communications. Traces of every communication from the enterprise network to the Internet and vis versa could be stored by exporting NetFlow records from the core router of the network. HTTP traffic is a well-known intrusion vector and represents a significant portion of the traffic of Internet users. Studying uniform resource identifiers (URIs) embedded in HTTP packets and their payload help detect and prevent malicious communications. Domain Name System (DNS) requests are performed to get IP addresses associated with a domain and consult the associated resource. Therefore, monitoring the DNS to identify malicious domains is efficient in proactively detecting and preventing an important part of malicious communications. A honeypot generally emulates vulnerable services and contains fake production data. Logging honeypot information helps obtain attackers’ data about targeting a specific network such as protocols used, IP addresses used, exploit file used, and scanning strategies, etc. [5]. There are three models of intrusion detection mechanisms: signature-based, anomaly-based, and hybrid detection [6]. However, two approaches of attack identification are usually used in an IDS: 1) signatures that are specific