International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 269 DETECTING MALICIOUS URLS USING MACHINE LEARNING TECHNIQUES: A COMPARATIVE LITERATURE REVIEW Lekshmi A R 1 , Seena Thomas 2 1 M.Tech Student in Computer Science and Engineering at LBS Institute of Technology for Women, Trivandrum, Kerala. 2 Associaste professor in Computer Science and Engineering department at LBS Institute of Technology for Women, Trivandrum, Kerala. ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Today the most important concern in the field of cyber security is finding the serious problems that make loss in secure information. It is mainly due to malicious URLs. Malicious URLs are generated daily. This URLs are having a short life span. Various techniques are used by researchers for detecting such threats in a timely manner. Blacklist method is famous among them. Researchers uses this blacklist method for easily identifying the harmful URLs. They are very simple and easy method. Due to their simplicity they are used as a traditional method for detecting such URLs. But this method suffers from many problems. The lack of ability in detecting newly generated malicious URLs is one of the main drawbacks of Blacklist method. Heuristic approach is also used for identifying some common attacks. It is an advanced technique of Blacklist method. But this method cannot be used for all type of attacks. So this method is used very shortly. For a good experience, the researchers introduce machine learning techniques. Machine Learning techniques go through several phases and detect the malicious URLs in an accurate manner. This method also gives the details about the false positive rate. This review paper studies the different phases such as feature extraction phase and feature representation phase of machine learning techniques for detecting malicious URLs. Different machine learning algorithms used for such detection is also discuss in this paper. And also gives a better understanding about the advantage of using machine learning over other techniques for detecting malicious URLs and problems it suffers. Key Words: Blacklist, Cyber Security, Malicious URL. 1. INTRODUCTION The growth and promotion of businesses spanning across many applications including online-banking, e-commerce, and social networking due to the advent of new communication technologies. The use of the World Wide Web has increasing day by day. By using the Internet, most of the time malicious software, shortly named malware, or attacks are propagated. Delivering malicious content on the web has become a usual technique for bad actors due to increased internet access by more than half of the world population. The explicit hacking attempts, drive-by exploits, social engineering, phishing, watering hole, man-in-the middle, SQL injections, loss/theft of devices, denial of service, distributed denial of service, and many others are variety of techniques used to implement website attacks. The limitations of traditional security management technologies are becoming more and more serious given this exponential growth of new security threats, rapid changes of new IT technologies, and significant shortage of security professionals. By spreading compromised URLs most of these attacking techniques are realized. Malicious URLs are compromised URLs that are used for cyber-attacks. To avoid information loss the URL identification is the best solution. So the identification of malicious URL is always a hot area of information security. Drive-by Download, Phishing and Social Engineering, and Spam are most popular types of attacks using malicious URLs. Downloading of malware by visiting a URL is referred as Drive-by-download. By exploiting vulnerabilities in plugins or inserting malicious code through JavaScript, Drive-by-download attack is usually carried out. For affecting the genuine web pages, the phishing and social engineering attacks ploy users into disclosing their sensitive information. Spam is used as voluntary messages for the purpose of advertising or phishing. Every year this types of attacks leads several problems. So the main concern exists today is detecting such malicious URLs in a timely manner. In this paper we mainly discuss about the different methods used for detecting malicious URLs. This paper mainly focus on the advantage of machine learning techniques in the field of detecting malicious URLs over other techniques. The classification of feature representation stage of machine learning techniques is explained in detail. The methodical classification different feature extraction stage of machine learning techniques is well explained. The paper provide a good explanation about different machine learning techniques used for detection. Moreover this paper gives an additional information about the difficulties caused by user in using machine learning techniques for the malicious URL detection. 2. METHODS OF DETECTING MALICIOUS URLS The main methods for detecting malicious URLs are blacklist method, heuristic method and machine learning method. These are explained as follows.