Behavior-based Proactive Detection
of Unknown Malicious Codes
Jianguo Ding
∗‡
, Jian Jin
†
, Pascal Bouvry
∗
, Yongtao Hu
§
and Haibing Guan
¶
∗
Faculty of Science, Technology and Communication (FSTC), University of Luxembourg, L-1359 Luxembourg
Email: Jianguo.Ding@ieee.org
†
School of Information Science and Technology, East China Normal University, Shanghai, 200062, P. R. China
‡
Software Engineering Institute, East China Normal University, Shanghai, 200062, P. R. China
§
The Third Research Institute of the Ministry of Public Security, P. R. China
¶
School of Information Security Engineering, Shanghai Jiao Tong University, Shanghai 200030, P. R. China
Abstract—With the rising popularity of the Internet, the re-
sulting increase in the number of available vulnerable machines,
and the elevated sophistication of the malicious code itself, the
detection and prevention of unknown malicious codes meet great
challenges. Traditional anti-virus scanner employs static features
to detect malicious executable codes and is hard to detect the
unknown malicious codes effectively. We propose behavior-based
dynamic heuristic analysis approach for proactive detection of
unknown malicious codes. The behavior of malicious codes is
identified by system calling through virtual emulation and the
changes in system resources. A statistical detection model and
mixture of expert (MoE) model are designed to analyze the
behavior of malicious codes. The experiment results demonstrate
the behavior-based proactive detection is efficient in detecting
unknown malicious executable codes.
I. I NTRODUCTION
Malicious code (or malware) is defined as any program
(including macros and scripts) that is specifically coded to
cause an unexpected (and usually unwanted) event on a
user’s PC or a server. Typical examples include viruses,
Trojan horses, Worms, Back doors, Spyware, and Adware,
etc. One reason for the prevalence of malicious code on
today’s networks is the rising popularity of the Internet and
the resulting increase in the number of available vulnerable
machines because of security-unaware users. Another reason
is the elevated sophistication of the malicious code itself [3].
One issue raised was about the behaviour of malicious
code and their sources. Surprisingly, the basic functionality
of malware has not changed much. The samples that are
observed today either steal sensitive information (key loggers,
password thieves, Bank Trojans), send spam mails, or can
be used to launch denial of service attacks. But the real
development of malicious codes make themselves hard to
be detected and identified by obfuscation techniques. For
example, polymorphic viruses would change form each time
the virus infected a new victim. Metamorphic virus will change
the structure of the virus body as well as the decryption engine,
making it impossible to get a signature match [9].
Meanwhile, mapping out dark (honeypot) address spaces is
an emerging threat. As a result, there is a need to develop
techniques that can accurately capture emerging threats, since
a good intelligence is a prerequisite for subsequent mitigation
efforts.
Traditional signature-based anti-virus scanner gets segments
of file content as the technical component. The analytical
component is just a simple comparison between the segments
and the signature-pattern database. This method gives birth to
a very low false-positive fraction near to zero while it per-
forms poorly when facing with previously unknown malicious
executables or variants of existing ones.
Current anti-virus scanner involves static heuristic to alle-
viate this problem. Instead of looking for specific signature
of a virus, it looks for virus behavior. Each signature is a
generic code sequence that represents a behavior feature and
a complex comparison is invited in the analytical component.
However, this method also drives data form the file content as
the technical component and can be obfuscated by techniques
such as polymorphism and metamorphism. Although wildcard
have been added to the code sequence to resolve the obfus-
cation problem, a high false positive fraction comes along
consequently. On the other hand this method depends on aided
techniques such as unpacking, decryption and disassembly.
This paper tries to use dynamic heuristic method to analyze
the running behaviors of malicious codes and try to establish
an automatic mechanism to assist classifying and identifying
unknown malicious codes. The main contributions are sum-
marized as follows:
1. The characteristic behaviors of malicious codes are
identified based on the behavior features with corresponding
Win32 API calls and their certain parameters.
2. An automatic executable behavior tracing system is
implemented to dynamically capture the behavior features we
defined.
3. Two approaches are presented for the behavior analysis
and to establish classification strategies for proactive detec-
tion for malicious codes. Experiment results demonstrate that
the proactive strategies are efficient in detecting previously
unknown malicious executables.
The rest of this paper is organized as follows: Section
2 describes related work on malicious executables detection
based on malicious behavior. Section 3 presents the malicious
behavior feature definition. Section 4 gives the details of the
dynamic behavior analysis for malicious codes. Section 5
2009 Fourth International Conference on Internet Monitoring and Protection
978-0-7695-3612-5/09 $25.00 © 2009 IEEE
DOI 10.1109/ICIMP.2009.20
72