International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 269
DETECTING MALICIOUS URLS USING MACHINE
LEARNING TECHNIQUES: A COMPARATIVE LITERATURE REVIEW
Lekshmi A R
1
, Seena Thomas
2
1
M.Tech Student in Computer Science and Engineering at LBS Institute of Technology for Women, Trivandrum,
Kerala.
2
Associaste professor in Computer Science and Engineering department at LBS Institute of Technology for Women,
Trivandrum, Kerala.
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Today the most important concern in the field of
cyber security is finding the serious problems that make loss in
secure information. It is mainly due to malicious URLs.
Malicious URLs are generated daily. This URLs are having a
short life span. Various techniques are used by researchers for
detecting such threats in a timely manner. Blacklist method is
famous among them. Researchers uses this blacklist method
for easily identifying the harmful URLs. They are very simple
and easy method. Due to their simplicity they are used as a
traditional method for detecting such URLs. But this method
suffers from many problems. The lack of ability in detecting
newly generated malicious URLs is one of the main drawbacks
of Blacklist method. Heuristic approach is also used for
identifying some common attacks. It is an advanced technique
of Blacklist method. But this method cannot be used for all
type of attacks. So this method is used very shortly. For a good
experience, the researchers introduce machine learning
techniques. Machine Learning techniques go through several
phases and detect the malicious URLs in an accurate manner.
This method also gives the details about the false positive rate.
This review paper studies the different phases such as feature
extraction phase and feature representation phase of machine
learning techniques for detecting malicious URLs. Different
machine learning algorithms used for such detection is also
discuss in this paper. And also gives a better understanding
about the advantage of using machine learning over other
techniques for detecting malicious URLs and problems it
suffers.
Key Words: Blacklist, Cyber Security, Malicious URL.
1. INTRODUCTION
The growth and promotion of businesses spanning across
many applications including online-banking, e-commerce,
and social networking due to the advent of new
communication technologies. The use of the World Wide
Web has increasing day by day. By using the Internet, most
of the time malicious software, shortly named malware, or
attacks are propagated. Delivering malicious content on the
web has become a usual technique for bad actors due to
increased internet access by more than half of the world
population. The explicit hacking attempts, drive-by exploits,
social engineering, phishing, watering hole, man-in-the
middle, SQL injections, loss/theft of devices, denial of
service, distributed denial of service, and many others are
variety of techniques used to implement website attacks. The
limitations of traditional security management technologies
are becoming more and more serious given this exponential
growth of new security threats, rapid changes of new IT
technologies, and significant shortage of security
professionals. By spreading compromised URLs most of
these attacking techniques are realized.
Malicious URLs are compromised URLs that are used for
cyber-attacks. To avoid information loss the URL
identification is the best solution. So the identification of
malicious URL is always a hot area of information security.
Drive-by Download, Phishing and Social Engineering, and
Spam are most popular types of attacks using malicious
URLs. Downloading of malware by visiting a URL is referred
as Drive-by-download. By exploiting vulnerabilities in
plugins or inserting malicious code through JavaScript,
Drive-by-download attack is usually carried out. For
affecting the genuine web pages, the phishing and social
engineering attacks ploy users into disclosing their sensitive
information. Spam is used as voluntary messages for the
purpose of advertising or phishing. Every year this types of
attacks leads several problems. So the main concern exists
today is detecting such malicious URLs in a timely manner.
In this paper we mainly discuss about the different methods
used for detecting malicious URLs. This paper mainly focus
on the advantage of machine learning techniques in the field
of detecting malicious URLs over other techniques. The
classification of feature representation stage of machine
learning techniques is explained in detail. The methodical
classification different feature extraction stage of machine
learning techniques is well explained. The paper provide a
good explanation about different machine learning
techniques used for detection. Moreover this paper gives an
additional information about the difficulties caused by user
in using machine learning techniques for the malicious URL
detection.
2. METHODS OF DETECTING MALICIOUS URLS
The main methods for detecting malicious URLs are blacklist
method, heuristic method and machine learning method.
These are explained as follows.