International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 7625 Feasible Performance Comparison of E-mail Spam Classification Based on Machine Learning Techniques Mitu Pal 1 , Bristi Rani Roy 2 1 mitu151350@gmail.com, Lecturer, Dept. of CSE, Haji Abul Hossain Institute of Technology, Bangladesh 2 bristiranyroy@gmail.com, Lecturer, Dept. of CSE, Bangladesh Army University of Engineering & Technology, Bangladesh ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Worldwide email is a common and fast communicating way and relatively low sending cost for message transfer protocol. But sometimes without filtering mail box are fill-up with unsolicited bulk email and junk email that is known as spam email. Many financial transaction and electronic business contribute or promote their business through email, which is very annoying to users. The use of spam email is rapidly increasing day after day. For that reason, filtering is essential and popular one to stop spam email. ML approaches are given more successful rate to filtering the spam email. In our paper, we give an overview some of ml classification algorithms as K-Nearest Neighbor (KNN), Naive Bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Multilayer perception (MLP) are used for learning the features of spam emails. By using the confusion matrix on 10-fold cross- validation in this paper to compare the performance of those six ML classifiers based on accuracy, recall & precision. The main goal of this article is to determine the better spam classification techniques for spam detection. Key Words: K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Logistic Regression, Random Forest, Multilayer perception, Accuracy, Precision, Recall, ROC Curve Analysis. 1. INTRODUCTION Now a day’s internet has become an integral parts of our daily life. It is growing lavishly day by day. We exchange information through internet using different tools, due to it takes less time and efficient also low cost. E-mail is one of the mostly used tools for information exchange. Email provides some advantages over other method such as, data security during information exchange, negligible time delay, low cost etc. But there is some issues that spoil the pleasure of using email efficiently. And what can be a great example of it than spam. Unsought bulk of junk email is called spam email. On the internet it is a massive problem. In recent statistics, 40% of all emails are spam which about 15.4 billion email per day and that cost internet users about $355 million per year [1]. Spam email is very cheap to send so that, a large number of spam email is sent to the users. When large number of spam email is received by users then it is very hard to detect spam or ham email and also it takes time to delete during in this time period it may crash the server. It causes many problem for users such as waste of time, storage, computational power, money laundering etc. Spam filtering is one of the effective way to detect spam email. But spammers now a days use tricky method to pass filtering successfully. However knowledge engineering and machine learning is still effective than filtering to detect spam email. Machine learning approach does not require specifying any rules that’s why Machine learning approach is more efficient than knowledge engineering approach [2]. The mail goal of the article is to detect spam email with high accuracy using different Machine Learning (ML) classification approaches. Rest of the article is indexed as follows: in section 2 we discuss the summary of related paper. Section 3 explain the dataset. We discuss about different ML classification techniques in section 4. In section 5 we analysis the experimental result. We show the comparison of different ML techniques in section 6. In section 7 we enclose the paper with conclusion. 2. RELATED RESEARCH WORK Many research has been done for spam email detection using ML techniques or other techniques. Here we try to summarize some related work for spam classification. In [1] authors used different ML classification technique for spam classification task. They used SVM, NB, KNN, AIS, NN, RS algorithm for spam detection. In [3] authors proposed a model using the SVM for classification task. Here they analyze sender behavior and give a trust value based on this trust value they classify spam email. They also show that SVM classifier is effective than Random Forest. In [4] authors have used neural network approach for spam email classification task. Though from the result we can show that ANN achieved good accuracy and it is good for spam classification but it is not efficient as a spam filtering tool to be used ANN alone.