e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:06/Issue:07/July-2024 Impact Factor- 7.868 www.irjmets.com www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [1166] DETECTION OF PHISHING ATTACKS USING MACHINE LEARNING TECHNIQUES Samaila Kasimu Ahmad *1 , Babagana Ali Dapshima *2 , Yasmin Chuupa Essa *3 *1 Department Of Computer Science And Application, Sharda University, Greater Noida UP India. *2,3 Department Of Computer Science And Engineering, Sharda University, Greater Noida UP India. DOI: https://www.doi.org/10.56726/IRJMETS60054 ABSTRACT The study evaluated several machine learning techniques for detecting phishing attacks, including Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR). Two datasets were used - one from PhishTank and another from the UCI machine learning repository. Results showed that the Random Forest model achieved the highest accuracy across multiple metrics. On the PhishTank dataset, RF had the best K-fold cross-validation accuracy at 99.55%, feature selection accuracy at 99.00%, and hyperparameter tuning accuracy at 99.45%. The XGBoost model performed well too, with 99.16% K-fold accuracy on PhishTank. On the UCI dataset, XGBoost had the highest K-fold accuracy at 97.16%, while RF still demonstrated maximum accuracy for feature selection and hyperparameter tuning. Logistic Regression consistently showed the lowest accuracy across datasets and metrics. The proposed approach was validated against other researchers' work on PhishTank, achieving 98.80% accuracy, which was compared favorably. ROC curves further illustrated the strong performance, especially for the top-performing models. The study demonstrated that using selected features and hyperparameter tuning could enhance detection accuracy. The machine learning algorithms, particularly Random Forest, outperformed other state-of- the-art techniques in accurately identifying phishing attacks. The high accuracy metrics indicate the proposed framework's effectiveness in detecting phishing attempts. Keywords: Detection, Phishing Attack, Phish Tank, Hyper-Parameter, Machine Learning. I. INTRODUCTION Phishing continues to be a widespread and persistent cyber threat for valid reasons. For nearly three decades, it has proven to be a highly efficient method for breaching a company's defenses, primarily due to its adept manipulation of individuals. Despite increased awareness and dedicated resources aimed at preventing phishing, it remains as prevalent as ever [1]. When discussing phishing, it is crucial to understand that its effectiveness is not solely dependent on the credibility of the bait; it also heavily relies on the use of various social engineering techniques to entice individuals. Cybercriminals excel in the art of persuasion, and phishing methods constantly adapt to influence people's decisions. Ultimately, human cognition and behavior underpin the success of phishing [2] [3]. Criminals target businesses and organizations to gain access to sensitive information, which they then use against not only their immediate victims but also the customers and constituents of those victims. Phishing attempts often imitate trusted authorities and common websites in ways that may not raise suspicion among those who are not experienced in recognizing such scams. The attack typically succeeds once the attacker gains access, which is often unintentionally granted [4]. Efforts to defend against phishing through technology-based methods frequently fall short due to their limitations, and because the human factor cannot always be relied upon to provide sufficient support. User behavior-based protection and prevention strategies also tend to falter due to inadequate or infrequent training, an inability to measure the right metrics, and situations that put users in lose-lose scenarios [5]. In the present day, everyone is interconnected online, utilizing various hardware and software, and gradually linking up with all aspects of life. Currently, 16% of the global population is internet users. While the internet offers numerous advantages, its misuse can have severe consequences in terms of cybersecurity [6]. Malicious actors are present on the web, tricking users into trusting their fraudulent websites and guiding them towards actions that expose their information. The solution isn't to shun the internet altogether but to acquire knowledge about these threats and exercise caution to avoid falling prey to such attacks.