© IJARCSMS (www.ijarcsms.com), All Rights Reserved 29 | P age
e-ISJN: A4372-3114 ISSN: 2321-7782 (Online)
p-ISJN: A4372-3115 ISSN: 2347-1778 (Print)
Impact Factor: 7.327
Volume 8, Issue 2, February 2020
International Journal of Advance Research in
Computer Science and Management Studies
Research Article / Survey Paper / Case Study
Available online at: www.ijarcsms.com
A Survey of Machine Learning Techniques for Identifying and
Classifying Malwares
Umesh V. Nikam
1
Department of Computer Science & Engineering,
P. R. M. I. T&R, Badnera.
Amravati, India
Dr. V. M. Deshmukh
2
Department of Computer Science & Engineering
P. R. M. I. T&R, Badnera.
Amravati, India
Abstract: A serious threat on the internet today is a malware. As the malware propagate they change their code. Nowdays
attacker creates polymorphic and metamorphic malwares. The traditional signature based detection techniques are
inefficient against modern day’s malware threats. The various malware families have different behavior pattern reflecting
their origin and purposes. These patterns can be used to detect and classify unknown malwares into their families using
machine learning technique. This survey paper provides an overview of various techniques for detecting and classifying
malwares into their respective families.
Keywords: Malware, Machine learning, Classification.
I. INTRODUCTION
A malware is a computer program with the purpose of causing harm to the operating system. Basic purpose of malware is
to fulfill the harmful intent of an attacker by gathering personal information about a user or host system, thus hampering
availability, integrity and privacy of user’s data. There is a wide a range of malwares like Worm, Virus, Trojan horse, Rootk it,
Backdoor, Botnet, Spyware, Adware etc.
Known software threats can be detected by modern antivirus software effectively but is inefficient in detecting novel
malware. A study by AusCERT found that 80 percent of new malware was not detected by latest antivirus software. [1]
Detection, mitigation and classification of malware is a major problem in internet today. The malwares are continuously
growing in volume, variety and velocity.
A. LIMITATIONS OF TRADITIONAL ANTIVIRUS
Traditional signature based antivirus system is reactive in nature. In order to detect a malware in earlier days malware
analyst used to manually generate a signature or a hash, and creates a database of a those signatures. During every new scan
antivirus system scans the database and if there is a match detects the malware. But because of polymorphic nature of malwares;
this signature based detection technique is not able to identify various security threats. In order to create a more reliable and
robust system we need to develop an alternative to the traditional signature based detection system.
To overcome the drawback of signature based system, malware analysis techniques are being followed, which can be either
static or dynamic. These malware analysis techniques help the analyst to understand risk associated with malicious code.
In static analysis malicious software’s are analyzed without being executed. Before doing static analysis it is necessary to
unpack and decrypt executables. The detection pattern used can be Byte Sequence, N Grams, Syntactic Library Call, Control
Flow Graph, String Signature etc.