J Comput Virol (2008) 4:323–334
DOI 10.1007/s11416-008-0082-4
ORIGINAL PAPER
An intelligent PE-malware detection system based on association mining
Yanfang Ye · Dingding Wang · Tao Li · Dongyi Ye ·
Qingshan Jiang
Received: 24 September 2007 / Revised: 8 January 2008 / Accepted: 13 January 2008 / Published online: 5 February 2008
© Springer-Verlag France 2008
Abstract The proliferation of malware has presented a
serious threat to the security of computer systems. Tradi-
tional signature-based anti-virus systems fail to detect poly-
morphic/metamorphic and new, previously unseen malicious
executables. Data mining methods such as Naive Bayes and
Decision Tree have been studied on small collections of exe-
cutables. In this paper, resting on the analysis of Windows
APIs called by PE files, we develop the Intelligent Mal-
ware Detection System (IMDS) using Objective-Oriented
Association (OOA) mining based classification. IMDS is
an integrated system consisting of three major modules: PE
parser, OOA rule generator, and rule based classifier. An
OOA_Fast_FP-Growth algorithm is adapted to efficiently
generate OOA rules for classification. A comprehensive
experimental study on a large collection of PE files obtai-
ned from the anti-virus laboratory of KingSoft Corporation is
A short version of the paper is appeared in [33]. The work is partially
supported by NSF IIS-0546280 and an IBM Faculty Research Award.
The authors would also like to thank the members in the anti-virus
laboratory at KingSoft Corporation for their helpful discussions and
suggestions.
Y. Ye
Department of Computer Science, Xiamen University,
Xiamen, People’s Republic of China
D. Wang · T. Li (B )
School of Computer Science, Florida International University,
Miami, FL, USA
e-mail: taoli@cs.fiu.edu
D. Ye
College of Maths and Computer Science, Fuzhou University,
Fuzhou, People’s Republic of China
Q. Jiang
Software School, Xiamen University, Xiamen,
People’s Republic of China
performed to compare various malware detection approaches.
Promising experimental results demonstrate that the accu-
racy and efficiency of our IMDS system outperform popular
anti-virus software such as Norton AntiVirus and McAfee
VirusScan, as well as previous data mining based detec-
tion systems which employed Naive Bayes, Support Vector
Machine (SVM) and Decision Tree techniques. Our sys-
tem has already been incorporated into the scanning tool of
KingSoft’s Anti-Virus software.
1 Introduction
Malicious executables are programs designed to infiltrate
or damage a computer system without the owner’s consent,
which have become a serious threat to the security of compu-
ter systems. New, previously unseen malicious executables,
polymorphic malicious executables using encryption and
metamorphic malicious executables adopting obfuscation
techniques are more complex and difficult to detect. Accor-
ding to its propagation methods, malicious code is usually
classified into the following categories [1, 7, 21]: viruses,
worms, trojan horses, backdoors and spyware. Malicious exe-
cutables do not always exactly fit into these categories and
the malicious code combining two or more categories can
lead to powerful attacks. For instance, a worm containing a
payload can install a back door to allow remote access. Due
to the significant loss and damages induced by malicious
executables, the malware detection becomes one of the most
critical issues in the field of computer security.
Currently, most widely-used malware detection software
uses signature-based method to recognize threats [8, 9].
Signatures are short strings of bytes which are unique
to the programs. They can be used to identify particular
viruses in executable files, boot records, or memory with
123