Processing intrusion detection alert aggregates with time series modeling Jouni Viinikka a, * ,1 , Hervé Debar a,1 , Ludovic Mé b,1 , Anssi Lehikoinen c,2 , Mika Tarvainen c,2 a France Telecom, BP 6243, FR-14066 Caen Cedex, France b Supélec, BP 81127, FR-35511 Cesson Sévigné Cedex, France c University of Kuopio, P.O. Box 1627, FIN-70211 Kuopio, Finland article info Article history: Received 19 December 2006 Received in revised form 21 January 2008 Accepted 23 January 2009 Available online 4 February 2009 Keywords: Network security Intrusion detection Alert correlation Time series modeling Kalman ﬁltering abstract The main use of intrusion detection systems (IDS) is to detect attacks against information systems and networks. Normal use of the network and its functioning can also be monitored with an IDS. It can be used to control, for example, the use of management and signaling protocols, or the network trafﬁc related to some less critical aspects of system policies. These complementary usages can generate large numbers of alerts, but still, in operational environment, the collection of such data may be mandated by the security policy. Processing this type of alerts presents a different problem than correlating alerts directly related to attacks or ﬁltering incorrectly issued alerts. We aggregate individual alerts to alert ﬂows, and then process the ﬂows instead of individual alerts for two reasons. First, this is necessary to cope with the large quantity of alerts – a common problem among all alert correlation approaches. Second, individual alert’s relevancy is often indeterminable, but irrele- vant alerts and interesting phenomena can be identiﬁed at the ﬂow level. This is the particularity of the alerts created by the complementary uses of IDSes. Flows consisting of alerts related to normal system behavior can contain strong regularities. We pro- pose to model these regularities using non-stationary autoregressive models. Once modeled, the regular- ities can be ﬁltered out to relieve the security operator from manual analysis of true, but low impact alerts. We present experimental results using these models to process voluminous alert ﬂows from an operational network. Ó 2009 Elsevier B.V. All rights reserved. 1. Introduction Originally intrusion detection systems were designed to detect violations of the monitored system’s security policy. Today, com- plementary uses are more and more common. In some environ- ments the security policy mandates the tracking of all Simple Network Management Protocol (SNMP) and Internet Control Mes- sage Protocol (ICMP) trafﬁc and intrusion detection systems (IDSes) typically have generic signatures reacting to the mere occurrence of these protocols. The rationale for such monitoring using today’s IDS, with all its limitations (e.g. [1,2]), is in being able to detect large scale problems and anomalies which might mani- fest themselves indirectly through management and control proto- cols. The purpose is not in ﬁnding attacks contained in one packet nor complex attack scenarios executed by a skillful attacker. Detecting such attacks needs more speciﬁc signatures and, depending on the type of the attack, different correlation ap- proaches such as [3,4]. Another motivation can be monitoring the compliance to less critical aspects of system policies, such as in- stant messaging (IM) usage. We focus on the particularities of the alert ﬂow analysis in the Section 1 and the reader is referred to [5] for a more generic introduction to intrusion detection and alert correlation. 1.1. Alert ﬂow analysis The complementary uses of IDSes tend to create large numbers of alerts, but those alerts are often of low impact. By low impact we mean that the alert is correctly issued, but does not need immedi- ate reaction from the security operator. Snort [6] should trigger an alert SNMP public access udp in the presence of SNMP packet using word ‘‘public” as the community string. It is very likely that the IDS issued this alert on such a network packet. In other words it is very likely that the alert is issued correctly. Moreover, it can be impossible to determine the signiﬁcance of a single alert. Instead, the signiﬁcance depends from the number of similar alerts in recent history with respect to past behavior. We illustrate these two issues with two examples, one with ICMP mes- sages, and another with known, identiﬁed malicious trafﬁc. 1566-2535/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.inffus.2009.01.003 * Corresponding author. Tel.: +33 2 31 75 97 31; fax: +33 2 31 37 83 43. E-mail address: jouni.viinikka@orange-ftgroup.com (J. Viinikka). 1 Supported by the French National Research Agency (Agence Nationale de la Recherche) through the project ACES. 2 Supported by the Academy of Finland through the Centre of Excellence in Inverse Problems Research programme. Information Fusion 10 (2009) 312–324 Contents lists available at ScienceDirect Information Fusion journal homepage: www.elsevier.com/locate/inffus