Signature Generation and Detection of Malware Families V. Sai Sathyanarayan, Pankaj Kohli, and Bezawada Bruhadeshwar Centre for Security, Theory and Algorithmic Research (C-STAR) International Institute of Information Technology Hyderabad - 500032, India {satya vs,pankaj kohli}@research.iiit.ac.in, bezawada@iiit.ac.in Abstract. Malware detection and prevention is critical for the protec- tion of computing systems across the Internet. The problem in detecting malware is that they evolve over a period of time and hence, traditional signature-based malware detectors fail to detect obfuscated and previ- ously unseen malware executables. However, as malware evolves, some semantics of the original malware are preserved as these semantics are necessary for the effectiveness of the malware. Using this observation, we present a novel method for detection of malware using the correlation be- tween the semantics of the malware and its API calls. We construct a base signature for an entire malware class rather than for a single speci- men of malware. Such a signature is capable of detecting even unknown and advanced variants that belong to that class. We demonstrate our ap- proach on some well known malware classes and show that any advanced variant of the malware class is detected from the base signature. Keywords: Malware Detection, Signature Generation, Static Analysis. 1 Introduction Malware or malicious code refers to the broad class of software threats to com- puter systems and networks. It includes any code that modifies, destroys or steals data, allows unauthorized access, exploits or damages a system, or does some- thing that the user does not intend to do. Perhaps the most sophisticated types of threats to computer systems are presented by malicious codes that exploit vulnerabilities in applications. Pattern based signatures are the most common technique employed for malware detection. Implicit in a signature-based method is an apriori knowledge of distinctive patterns of malicious code. The advantage of such malware detectors lies in their simplicity and speed. While the signature- based approach is successful in detecting known malware, it does not work for new malware for which signatures have not yet been prepared. There is a need to train the detector often in order to detect new malware. One of the most common reasons that the signature-based approaches fail is when the malware mutates, making signature based detection difficult. The presence of such a metamorphism has already been witnessed in the past [5, 9]. Y. Mu, W. Susilo, and J. Seberry (Eds.): ACISP 2008, LNCS 5107, pp. 336–349, 2008. c Springer-Verlag Berlin Heidelberg 2008