Proceedings of the 43rd IEEE Conference on Decision and Control, Bahamas, December 2004 On the Statistical Distribution of Processing Times in Network Intrusion Detection Jo˜ ao B. D. Cabrera , Jaykumar Gosar , Wenke Lee and Raman K. Mehra Scientific Systems Company, Inc. 500 West Cummings Park, Suite 3000 Woburn MA 01801 USA Georgia Institute of Technology College of Computing 801 Atlantic Drive Atlanta, GA 30332 USA Abstract Intrusion Detection Systems (IDSs) are relatively complex devices that monitor information systems in search for security violations. Characterizing the service times of network IDSs is a crucial step in improving their real time performance. We analyzed about 41 million packets organized in five data sets of 10 minutes each col- lected at the entry point of a large production network and processed by Snort, a commonly used IDS. The processing times of the three main stages in Snort were measured. The main conclusions of our study were: (1) Rule checking accounts for about 75% of the total processing time in IDSs, with mean payload checking time being 4.5 times larger than mean header checking time. (2) The distri- bution of rule checking times is markedly bimodal, a direct con- sequence of the bimodality in packet composition in current high speed Internet traffic. (3) Header processing times have a small variance and small correlation coefficients. (4) In contrast, the dis- tribution of payload processing times displays high variance, in a form that can be generally characterized as “slightly heavy-tailed”. Explicitly, payload processing times have a Lognormal upper tail, clipped at the top 1%. This extreme upper tail is better fit by an Exponential distribution. (5) Additionally, payload processing times were shown to be highly correlated, with correlation coeffi- cients several orders of magnitude higher than the confidence bands for the standard whiteness test. The impact of these findings in the design of IDSs for real time operation in networks is discussed, and compared with existing results for processing times for Unix processes, which were shown to display pronounced heavy-tailed characteristics. 1 Introduction Intrusion Detection Systems (IDSs) are relatively complex devices that monitor information systems in search for secu- rity violations - [5], [19]. In network-based IDSs, data pack- ets enter the IDS and are subjected to a number of processing steps whose ultimate objective is to determine if the packet contains an intrusion or not. There are essentially three main steps in network IDSs, such as Snort – [6]: Packet decoding: Decodes the header information at the different layers and creates a data structure for the packet, which is used in the next steps. Preprocessing: Performs a number of preparatory steps in the packet, such as normalization, IP fragment re- assembly, TCP stream reconstruction, etc. Rule checking: Checks if the packet contains a partic- ular string, or a collection of strings, which are associ- ated with an intrusion. A rule consists at a minimum of a type of packet to search (protocol type), a string of content to match and a location where that string is to be searched for – [24]. Rule checking in Snort has two (sub)-steps: Non-Content Matching (NCM), per- formed in the packets’ headers and Content Matching (CM), performed in the packets’ payloads. Like any computing device operating in real time, the op- erational performance 1 of an IDS depends on the arrival rates of packets streaming at its input, and the service rates it pro- vides to the packets. The two components are equally impor- tant in characterizing the performance of the IDS, and their understanding is crucial for the design of more efficient sys- tems. The arrival rates of packets at network IDSs are the arrival rates of packets into the networking device in which it is installed, modulated by traffic shaping, if applicable. Much is known about the statistical properties of arrival rates of packets in the Internet, result of extensive research, es- pecially in the last decade – [7] and references therein. In contrast, very little is known about the statistical properties of service times in network IDSs. The focus of the research on network IDS evaluation has been on measuring the per- formance metrics as a function of the network load, traffic characteristics (balance between protocol types, presence of fragments, etc.) and complexity of the ruleset – eg. [12], [23]. A recent study - [2] - has measured the processing times of the various components of Snort, but no statistical char- acterization was attempted. The objective was to construct synthetic workloads out of real traffic, for use in IDS bench- marking. In this paper, we study the statistical properties of 1 By operational performance we mean the usual metrics of mean service time, percentage of dropped packets, etc. 1