Predicting Phishing Websites using Classification Mining Techniques with Experimental Case Studies Maher Aburrous Dept. of Computing University of Bradford Bradford, UK mrmaburr@bradford.ac.uk M. A. Hossain Dept. of Computing University of Bradford Bradford, UK m.a.hossain1@bradford.ac.uk Keshav Dahal Dept. of Computing University of Bradford Bradford, UK k.p.dahal@bradford.ac.uk Fadi Thabtah MIS Department Philadelphia University Amman, Jordan ffayez@philadelpha.edu.jo Abstract Classification Data Mining (DM) Techniques can be a very useful tool in detecting and identifying e-banking phishing websites. In this paper, we present a novel approach to overcome the difficulty and complexity in detecting and predicting e-banking phishing website. We proposed an intelligent resilient and effective model that is based on using association and classification Data Mining algorithms. These algorithms were used to characterize and identify all the factors and rules in order to classify the phishing website and the relationship that correlate them with each other. We implemented six different classification algorithm and techniques to extract the phishing training data sets criteria to classify their legitimacy. We also compared their performances, accuracy, number of rules generated and speed. A Phishing Case study was applied to illustrate the website phishing process. The rules generated from the associative classification model showed the relationship between some important characteristics like URL and Domain Identity, and Security and Encryption criteria in the final phishing detection rate. The experimental results demonstrated the feasibility of using Associative Classification techniques in real applications and its better performance as compared to other traditional classifications algorithms. Key Words: Classification, Association, Data Mining, Fuzzy Logic, Machine Learning. 1. Introduction Phishing websites is a semantic attack which targets the user rather than the computer. It is a relatively new Internet crime in comparison with other forms, e.g., virus and hacking. The phishing problem is a hard problem because of the fact that it is very easy for an attacker to create an exact replica of a good banking site, which looks very convincing to users. The word phishing from the phrase “website phishing” is a variation on the word “fishing”. The idea is that bait is thrown out with the hopes that a user will grab it and bite into it just like the fish. In most cases, bait is either an e-mail or an instant messaging site, which will take the user to hostile phishing websites [7]. The motivation behind this study is to create a resilient and effective method that uses Data Mining algorithms and tools to detect e-banking phishing websites in an Artificial Intelligent technique. Associative and classification algorithms can be very useful in predicting Phishing websites. It can give us answers about what are the most important e-banking phishing website characteristics and indicators and how they relate with each other. Comparing between different Data Mining classification and association methods and techniques is also a goal of this investigation since there are only few studies that compares different data mining techniques in predicting phishing websites. The paper is organized as follows: Section 2 presents the literature review, Section 3 shows the case studies, Section 4 shows data mining phishing approach, Section 5 shows the phishing website methodology of the research, Section 6 shows the utilization of the DM classification techniques, Section 7 reveals the experimental results of implementing the classification data mining techniques in the phishing training data sets and then conclusions and future work are given in Section 8. 2. Literature Review Despite growing efforts to educate users and create better detection tools, users are still very susceptible to phishing attacks. Unfortunately, due to the nature of the attacks, it is very difficult to estimate the number of people who actually fall victim. A report by Gartner estimated the costs at $1,244 per victim, an increase over