International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 7625
Feasible Performance Comparison of E-mail Spam Classification Based
on Machine Learning Techniques
Mitu Pal
1
, Bristi Rani Roy
2
1
mitu151350@gmail.com, Lecturer, Dept. of CSE, Haji Abul Hossain Institute of Technology, Bangladesh
2
bristiranyroy@gmail.com, Lecturer, Dept. of CSE, Bangladesh Army University of Engineering & Technology,
Bangladesh
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Worldwide email is a common and fast
communicating way and relatively low sending cost for
message transfer protocol. But sometimes without filtering
mail box are fill-up with unsolicited bulk email and junk email
that is known as spam email. Many financial transaction and
electronic business contribute or promote their business
through email, which is very annoying to users. The use of
spam email is rapidly increasing day after day. For that
reason, filtering is essential and popular one to stop spam
email. ML approaches are given more successful rate to
filtering the spam email. In our paper, we give an overview
some of ml classification algorithms as K-Nearest Neighbor
(KNN), Naive Bayes (NB), Support Vector Machine (SVM),
Logistic Regression (LR), Random Forest (RF), Multilayer
perception (MLP) are used for learning the features of spam
emails. By using the confusion matrix on 10-fold cross-
validation in this paper to compare the performance of those
six ML classifiers based on accuracy, recall & precision. The
main goal of this article is to determine the better spam
classification techniques for spam detection.
Key Words: K-Nearest Neighbors, Naïve Bayes, Support
Vector Machine, Logistic Regression, Random Forest,
Multilayer perception, Accuracy, Precision, Recall, ROC
Curve Analysis.
1. INTRODUCTION
Now a day’s internet has become an integral parts of our
daily life. It is growing lavishly day by day. We exchange
information through internet using different tools, due to it
takes less time and efficient also low cost. E-mail is one of the
mostly used tools for information exchange. Email provides
some advantages over other method such as, data security
during information exchange, negligible time delay, low cost
etc. But there is some issues that spoil the pleasure of using
email efficiently. And what can be a great example of it than
spam. Unsought bulk of junk email is called spam email. On
the internet it is a massive problem. In recent statistics, 40%
of all emails are spam which about 15.4 billion email per day
and that cost internet users about $355 million per year [1].
Spam email is very cheap to send so that, a large number of
spam email is sent to the users. When large number of spam
email is received by users then it is very hard to detect spam
or ham email and also it takes time to delete during in this
time period it may crash the server. It causes many problem
for users such as waste of time, storage, computational
power, money laundering etc. Spam filtering is one of the
effective way to detect spam email. But spammers now a
days use tricky method to pass filtering successfully.
However knowledge engineering and machine learning is
still effective than filtering to detect spam email. Machine
learning approach does not require specifying any rules
that’s why Machine learning approach is more efficient than
knowledge engineering approach [2]. The mail goal of the
article is to detect spam email with high accuracy using
different Machine Learning (ML) classification approaches.
Rest of the article is indexed as follows: in section 2 we
discuss the summary of related paper. Section 3 explain the
dataset. We discuss about different ML classification
techniques in section 4. In section 5 we analysis the
experimental result. We show the comparison of different
ML techniques in section 6. In section 7 we enclose the paper
with conclusion.
2. RELATED RESEARCH WORK
Many research has been done for spam email detection using
ML techniques or other techniques. Here we try to
summarize some related work for spam classification.
In [1] authors used different ML classification technique for
spam classification task. They used SVM, NB, KNN, AIS, NN,
RS algorithm for spam detection.
In [3] authors proposed a model using the SVM for
classification task. Here they analyze sender behavior and
give a trust value based on this trust value they classify spam
email. They also show that SVM classifier is effective than
Random Forest.
In [4] authors have used neural network approach for spam
email classification task. Though from the result we can
show that ANN achieved good accuracy and it is good for
spam classification but it is not efficient as a spam filtering
tool to be used ANN alone.