Performance Analysis of Boosting Techniques for
Classificationand Detection of Malicious Websites
W. Regis Anne
1
, S. CarolinJeeva
2
{wra.amcs@psgtech.ac.in
1
, caroljeeva@gmail.com
2
}
1
Assistant Professor, Department of Applied Mathematics and Computational Sciences, PSG College
of Technology, Coimbatore, India.,
2
Associate Professor, Department of Digital Sciences
Karunya Institute of Technology and Sciences, Coimbatore, India.
Abstract. Phishing is a method of social engineering technique to deceive web users to
capture sensitive information like user name and password in websites without the
knowledge of the end user. The end user provides information about their personal and
financial thinking it’s the authenticated service provider. URL meaning the "Uniform
Resource Locator" that identifies an address to a file in the server. The URLs can be
categorized as benign or malicious. Malicious URLs are created for the purpose of
attacking to create loss and poses great threat to the victims. Machine Learning
approaches offer a wide range of algorithms to detect malicious websites. It considers the
URL as a set of features of Lexical, Host based and Content features to train a model to
classify it as malicious or benign. Boosting is a collection of algorithms that combine the
weaklearning classifiers to build strong Classifiers. In this paper boosting algorithms are
exploited to the study of URL detection as malicious or benign. Boosting algorithms such
as LGBM, XGBoost and Gradient Boosting are used for predicting phishing URL is
presented. Feature selection to identify the important features is performed. The selected
features are then classified by Random Forest Classifier to give an accuracy of 99%.
Keywords: Malicious, Benign, Machine learning, Boosting, Cyber Security, LGBM,
XGBoost and Gradient Boosting, Accuracy, Precision, Recall and Support.
1 Introduction
Phishing is a method of social engineering technique to deceive web users to capture
sensitive information like user name and password in websites without the knowledge of the
end user. The end user provides information about their personal and financial thinking it’s the
authenticated service provider. At present, the Internet is the daily way of life for everyone.
Internet services are used to communicate and to perform mission critical system for various
businesses. As a result, the cyber-crimes have augmented and thereby security companies are
developing new techniques to protect their assets from the hackers. A phisher creates a fake
webpage that resembles the legitimate webpage and thereby probe the user to enter the
sensitive details like user name, password and it is transferred to the hacker’s server. URL
meaning the "Uniform Resource Locator" that identifies an address to a file in the server. The
URLs can be categorized as benign or malicious. Malicious URLs are created for the purpose
of attacking to create loss and poses great threat to the victims. Spamming, phishing, denial of
service, malware, attack page and SQL injection are categories of malicious attack. Benign
URLs are associated with webpage that does not cause a phishing attack.
ICCAP 2021, December 07-08, Chennai, India
Copyright © 2021 EAI
DOI 10.4108/eai.7-12-2021.2314506