Network Intrusion Alert Aggregation Based on PCA and Expectation Maximization Clustering Algorithm Maheyzah Md Siraj + , Mohd Aizaini Maarof and Siti Zaiton Mohd Hashim Faculty of Computer Science and Information Systems Universiti Teknologi Malaysia 81310 Skudai Johor, Malaysia Abstract. Most of the organizations implemented various security sensors for increased information security and assurance. A popular choice is Network Intrusion Detection Systems (NIDSs). Unfortunately, NIDSs trigger a massive amount of alerts even for a day and overwhelmed security experts. Worse, a large number of these alerts are false positives, and redundant warnings for the same attack, or alert notifications from erroneous activity. Such low quality of alerts gives negative impact to the alert analysis. We propose an alert aggregation model based on Principal Component Analysis (PCA) coupled with unsupervised learning clustering algorithm - Expectation Maximization for Gaussian Mixture (EM_GM) to aggregate similar alerts and to reduce the number of alerts. Our empirical results show that the proposed model effectively clustered NIDSs alerts and significantly reduced the alert volume. Keywords: alert clustering, alert filtering, alert aggregation, alert reduction, PCA, EM_GM 1. Introduction Network Intrusion Detection Systems (NIDSs) have been extensively used by researchers and practitioners to monitor intrusive activities in computer networks. NIDSs usually generated thousands of alerts even for a day. Worse, those alerts are mixed with false positives, and repeated warnings for the same attack, or alert notifications from erroneous activity [1]. Therefore, manually analyze those alerts are tedious, time-consuming and error-prone [1]. A promising technique to automatically analyze the intrusion alerts is called correlation. Alert Correlation Systems (ACS) is post-processing modules that provide high-level insight on the security state of the network and filter false positives as well as redundant alerts efficiently from the output of NIDSs [2]. The analysis results actually become an important guidance for the security expert (SE) to plan and develop the responsive and preventive mechanisms. Generally, correlation can be of two types: structural correlation and causal correlation. In this paper, we address the structural correlation (or alert clustering) aspect of NIDSs data to group (or aggregate) alerts with similar features. The main problem in existing ACSs is they require high levels of human SE involvement in creating the system and/or maintaining it. For instance, algorithm introduced by [3] required a significant amount of alerts to be managed manually (i.e., hand-clustered) beforehand. Likewise, system by [1], it required manual tuning periodically. Moreover, in their first system deployment, it needs to encode network properties to assist the clustering algorithm. These approaches were time-consuming since regular setup and maintenance are significantly required for their system. Therefore, those constraints make the development of supervised learning-based correlation system less practical. Our goal is to minimize the intervention (i.e., to ease the burden) of SE as much as possible, but not to replace them. Therefore, an unsupervised learning-based + Corresponding author. Tel.: +607 5532245; fax: +607 5593185. E-mail address: maheyzah@utm.my 395 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) © (2011) IACSIT Press, Singapore