Log Correlation for Intrusion Detection: A Proof of Concept * Cristina Abad †‡ cabad@ncsa.uiuc.edu Jed Taylor † jtaylr2@uiuc.edu Cigdem Sengul † sengul@uiuc.edu William Yurcik ‡ byurcik@ncsa.uiuc.edu Yuanyuan Zhou † yyzhou@uiuc.edu Ken Rowe § kenneth.e.rowe@saic.com † Department of Computer Science, University of Illinois at Urbana-Champaign ‡ National Center for Supercomputing Applications (NCSA) § Science Applications International Corporation (SAIC) Abstract Intrusion detection is an important part of networked- systems security protection. Although commercial products exist, finding intrusions has proven to be a difficult task with limitations under current techniques. Therefore, improved techniques are needed. We argue the need for correlating data among different logs to improve intrusion detection systems accuracy. We show how different attacks are re- flected in different logs and argue that some attacks are not evident when a single log is analyzed. We present experi- mental results using anomaly detection for the virus Yaha. Through the use of data mining tools (RIPPER) and corre- lation among logs we improve the effectiveness of an intru- sion detection system while reducing false positives. 1. Introduction Information and resource protection is a primary concern of most organizations. To be able to secure their assets, organizations need effective intrusion detection techniques to respond to and recover from Internet attacks. An intru- sion detection system (IDS) attempts to detect attacks by monitoring system or network behavior. While many ex- isting IDSs require manual definitions of normal and ab- normal behavior (intrusion signatures) [25], recent work has shown that it is possible to identify abnormalities au- * This research is funded in part by a grant from the Office of Naval Research (ONR) within the National Center for Advanced Secure Systems Research (NCASSR) <www.ncassr.org>. tomatically using machine learning or data mining tech- niques [6, 1, 2, 8, 14]. These works analyze network or system activity logs to generate models or rules, which the IDS can use to detect intrusions that can potentially com- promise the system integrity or reliability. However, most of the previous work on intrusion detec- tion focuses on activities generated by a single source, re- sulting in many false positives and undetected intrusions. For example, some studies [14] detect intrusions by mon- itoring the network traffic such as tcpdump logs, whereas other studies [2, 8] monitor only system call logs. These studies have high false positive rates. To address this problem requires correlating activities from all involved components and analyzing all logs to- gether. This is because an intrusion typically leaves mul- tiple signs of its presence, which to date has not been ex- ploited by security professionals. The general idea is to take advantage of attack traces by correlating information found in multiple heterogeneous logs and thus enable IDSs to cor- rectly identify more attacks while simultaneously reducing the number of false positives and providing a stronger vali- dation that an attack has indeed occurred. Furthermore, we posit that there are many attacks that are not evident by an- alyzing a single log but may be exposed when correlating information in multi-logs. We show in this paper that correlating log information is useful for improving both misuse detection and anomaly detection. To facilitate processing the millions of entries found in typical logs, data mining techniques are indeed useful. Specifically we use the data mining software tool RIPPER [3] and correlation to improve anomaly detection,