J Comput Virol (2008) 4:323–334 DOI 10.1007/s11416-008-0082-4 ORIGINAL PAPER An intelligent PE-malware detection system based on association mining Yanfang Ye · Dingding Wang · Tao Li · Dongyi Ye · Qingshan Jiang Received: 24 September 2007 / Revised: 8 January 2008 / Accepted: 13 January 2008 / Published online: 5 February 2008 © Springer-Verlag France 2008 Abstract The proliferation of malware has presented a serious threat to the security of computer systems. Tradi- tional signature-based anti-virus systems fail to detect poly- morphic/metamorphic and new, previously unseen malicious executables. Data mining methods such as Naive Bayes and Decision Tree have been studied on small collections of exe- cutables. In this paper, resting on the analysis of Windows APIs called by PE files, we develop the Intelligent Mal- ware Detection System (IMDS) using Objective-Oriented Association (OOA) mining based classification. IMDS is an integrated system consisting of three major modules: PE parser, OOA rule generator, and rule based classifier. An OOA_Fast_FP-Growth algorithm is adapted to efficiently generate OOA rules for classification. A comprehensive experimental study on a large collection of PE files obtai- ned from the anti-virus laboratory of KingSoft Corporation is A short version of the paper is appeared in [33]. The work is partially supported by NSF IIS-0546280 and an IBM Faculty Research Award. The authors would also like to thank the members in the anti-virus laboratory at KingSoft Corporation for their helpful discussions and suggestions. Y. Ye Department of Computer Science, Xiamen University, Xiamen, People’s Republic of China D. Wang · T. Li (B ) School of Computer Science, Florida International University, Miami, FL, USA e-mail: taoli@cs.fiu.edu D. Ye College of Maths and Computer Science, Fuzhou University, Fuzhou, People’s Republic of China Q. Jiang Software School, Xiamen University, Xiamen, People’s Republic of China performed to compare various malware detection approaches. Promising experimental results demonstrate that the accu- racy and efficiency of our IMDS system outperform popular anti-virus software such as Norton AntiVirus and McAfee VirusScan, as well as previous data mining based detec- tion systems which employed Naive Bayes, Support Vector Machine (SVM) and Decision Tree techniques. Our sys- tem has already been incorporated into the scanning tool of KingSoft’s Anti-Virus software. 1 Introduction Malicious executables are programs designed to infiltrate or damage a computer system without the owner’s consent, which have become a serious threat to the security of compu- ter systems. New, previously unseen malicious executables, polymorphic malicious executables using encryption and metamorphic malicious executables adopting obfuscation techniques are more complex and difficult to detect. Accor- ding to its propagation methods, malicious code is usually classified into the following categories [1, 7, 21]: viruses, worms, trojan horses, backdoors and spyware. Malicious exe- cutables do not always exactly fit into these categories and the malicious code combining two or more categories can lead to powerful attacks. For instance, a worm containing a payload can install a back door to allow remote access. Due to the significant loss and damages induced by malicious executables, the malware detection becomes one of the most critical issues in the field of computer security. Currently, most widely-used malware detection software uses signature-based method to recognize threats [8, 9]. Signatures are short strings of bytes which are unique to the programs. They can be used to identify particular viruses in executable files, boot records, or memory with 123