International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1
Detecting Phishing Websites Using Machine Learning
Sagar Patil
1
, Yogesh Shetye
2
, Nilesh Shendage
3
1,2,3
Department of Information Technology, Padmabhushan VasantDada Patil Pratishthan’s College of Engineering,
Sion, Mumbai Maharashtra, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - The goal of our project is to implement a machine learning solution to the problem of detecting phishing and
malicious web links. The end result of our project will be a software product which uses machine learning algorithm to detect
malicious URLs. Phishing is the technique of extracting user credentials and sensitive data from users by masquerading as a
genuine website. In phishing, the user is provided with a mirror website which is identical to the legitimate one but with malicious
code to extract and send user credentials to phishers. Phishing attacks can lead to huge financial losses for customers of banking
and financial services. The traditional approach to phishing detection has been to either to use a blacklist of known phishing links
or heuristically evaluate the attributes in a suspected phishing page to detect the presence of malicious codes. The heuristic
function relies on trial and error to define the threshold which is used to classify malicious links from benign ones. The drawback to
this approach is poor accuracy and low adaptability to new phishing links. We plan to use machine learning to overcome these
drawbacks by implementing some classification algorithms and comparing the performance of these algorithms on our dataset.
We will test algorithms such as Logistic Regression, SVM, Decision Trees and Neural Networks on a dataset of phishing links from
UCI Machine Learning repository and pick the best model to develop a browser plugin, which can be published as a chrome
extension.
Key Words: Phishing Detection(PD),Chrome Extension(CE), Random Forest(RF), Support Vector Machine(SVM), Neural Net-
works(NN).
1.Introduction
Financial services such as banking are now easily available over the Internet making the lives of people easy. Thus it is very
important that the security and safety of such services are maintained. One of the biggest threats to web security is Phishing.
Phishing is the technique of extracting user credentials by masquerading as a genuine website or service over the web. There
are various types of phishing attacks such as Spear phishing, which targets specific individuals or companies, Clone phishing is
a type of phishing where an original mail with an attachment or link is copied into a new mail with a different (possibly
malicious) attachment or link, Whaling, etc. Phishing can lead to huge financial losses. For example, the Microsoft Consumer
Safer Index (MCSI) report for 2014 has estimated the annual worldwide impact of Phishing and other identity thefts to be
nearly USD 5 Billion [1]. Similarly, the IRS has warned of a surge in phishing attacks with over 400% increase in reported cases
[2]. Several solutions have been proposed to combat phishing ranging from educating the web users to stronger phishing
detection techniques. The conventional approach to phishing detection has not been successful because of the diverse and
evolving nature of phishing attacks. For instance, in January 2007, the total number of unique phishing reports submitted to the
Anti-Phishing Working Group (APWG) was 29,930. Compared to the previous peak in June 2006, the number of submitted
reports increased by 5% [3]. This happened despite taking preventive measure to thwart phishing. Upon investigation, it was
found that each phishing attack was different from the other one. Thus, it becomes imperative to find a way to adapt our
phishing detection techniques as and when new attack patterns are uncovered.Machine learning algorithms, which make a
system learn new patterns from data, are an ideal solution to the problem of phishing detection. Although there have been
many papers in recent years which have attempted to detect phishing attacks using machine learning, we intend to go one first
step further and build a software tool which can be easily deployed in end user systems to detect phishing attacks.For our
project, we will experiment with three machine learning algorithms on a dataset of features that represent attributes
commonly associated with phishing pages, choose the best model based on their performance and build a web browser plugin
which will eventually be deployed to end users. The project report has been designed as follows; the Previous Work section
describes the traditional approaches to phishing detection and some of the machine learning ap-proaches attempted in recent
years, the Proposed Approach section describes in detail our approach and what will be the end product of our project, the
Dataset section describes the dataset that we are using for our project along with a list of features which will be used in our
project, Machine Learning Algorithms section explains the different algorithms which we have tested with our dataset with
their descriptions, the Chrome Plugin Implementation section describes the architecture of our phishing detection system and
gives descriptions of the various software modules in the system, the Results section gives the results of our experiments with