Detecting Worms via Mining Dynamic Program Execution Xun Wang, Wei Yu, Adam Champion, Xinwen Fu and Dong Xuan Abstract—Worm attacks have been major security threats to the Internet. Detecting worms, especially new, unseen worms, is still a challenging problem. In this paper, we propose a new worm detection approach based on mining dynamic program executions. This approach captures dynamic program behavior to provide accurate and efficient detection against both seen and unseen worms. In particular, we execute a large number of real- world worms and benign programs (executables), and trace their system calls. We apply two classifier-learning algorithms (Naive Bayes and Support Vector Machine) to obtain classifiers from a large number of features extracted from the system call traces. The learned classifiers are further used to carry out rapid worm detection with low overhead on the end-host. Our experimental results clearly demonstrate the effectiveness of our approach to detect new worms in terms of a very high detection rate and a low false positive rate. Index Terms— Worm detection, system call tracing, dynamic program analysis, data mining I. I NTRODUCTION In this paper, we address issues related to detecting worms, especially new, unseen worms. Worms are malicious programs that propagate themselves on the Internet to infect computers by remotely exploiting vulnerabilities in those computers. Worm attacks have always been considered major threats to the Internet. There have been many cases of Internet worm attacks that caused significant damage, such as the “Code Red” worm in 2001 [1], the “Slammer” worm in 2003 [2], and the “Witty/Sasser” worms in 2004 [3]. For example, in Novem- ber 2001, the Code Red worm infected more than 350, 000 computers in less than 14 hours by exploiting the buffer- overflow vulnerability in version 4.0 or 5.0 of Microsoft’s Internet Information Services (IIS) web server, resulting in over $1.2 billion in damage. After infecting a number of computers without being de- tected, the worm attacker can remotely control thems and use them as “stepping stones” to launch additional attacks. Consequently, as the first line of defense against worm attacks, worm detection research has become vital to the field of Internet security. In general, there are two types of worm detection systems: network-based detection and host-based detection. Network- based detection systems detect worms primarily by monitor- Xun Wang, Adam Champion and Dong Xuan are with the Department of Computer Science and Engineering, The Ohio-State University, Columbus, OH 43210. E-mail: {wangxu, champion, xuan}@cse.ohio-state.edu. Wei Yu is with the Department of Computer Science, Texas A&M University, College Station, TX 77843. E-mail: {weiyu}@cs.tamu.edu. Xinwen Fu is with the College of Business and Information Systems, Dakota State University, Madison, SD 57042. E-mail: xinwen.fu@dsu.edu. ing, collecting, and analyzing the scan traffic (messages to identify vulnerable computers) generated by worm attacks. Many detection schemes fall into this category [4]–[7]. Never- theless, because of their reliance on scan traffic, these schemes are not very effective in detecting worms that spread via email systems, instant messenger (IM) or peer-to-peer (P2P) applications. On the other hand, host-based detection systems detect worms by monitoring, collecting, and analyzing worm be- haviors on end-hosts. Since worms are malicious programs that execute on these machines, analyzing the behavior of worm executables 1 plays an important role in host-based detection systems. Mmany detection schemes fall into this category [8], [9]. Considering that a large number of real- world worm executables are accessible over the Internet, they provide an opportunity for researchers to directly analyze them to understand their behavior and, consequently, develop more effective detection schemes. Therefore, the focus of this paper is to use this large number of real-world worm executables to develop a host-based detection scheme which can efficiently and accurately detect new worms. Within this category, most existing schemes have been fo- cusing on static properties of executables [8], [9]. In particular, the list of called Dynamic Link Libraries (DLLs), functions and specific ASCII strings extracted from the executable headers, hexadecimal sequences extracted from the executable bodies, and other static properties are used to distinguish mali- cious and benign executables. However, using static properties without program execution might not accurately distinguish between these exectuables due to the following two reasons. First, two different executables (e.g., one worm and one benign) can have same static properties, i.e., they can call the same set of DLLs and even call the same set of functions. Second, these static properties can be changed by the worm writers by inserting “dummy” functions in the worm executable that will not be called during program execution, or by inserting benign-looking strings [10]. Hence, the static properties of programs, or how they look, are not the keys to distinguish worm and benign executables. Instead, we believe the keys are what programs do, i.e., their run-time behaviors or dynamic properties. Therefore, our study adopts dynamic program analysis to profile the run-time behavior of executables to efficiently and accurately detect 1 In this paper, an executable means a binary that can be executed, which is different from program source code. Authorized licensed use limited to: The Ohio State University. Downloaded on October 9, 2008 at 11:47 from IEEE Xplore. Restrictions apply.