Logistic Regression Based Classification of Spam and
Non-Spam Emails
Shahbaz Ahmad Khanday
1
, Suraiya Parveen
2
{shahbazshaban10@gmail.com
1
, husainsuraiya@gmail.com
2
}
Jamia Hamdard University New Delhi India
1,2
Abstract. An email client receives emails from different websites, portals and domains,
which can be an advertisement. Receiving a bulk amount of emails can cause serious
damages like suspension of a particular email id. Mostly an email client gets exposed to
the number of malicious receipts by registering an email account to a web portal, which in
turn sends a bulk amount of emails. One of the solutions to escape from spam emails is to
develop a decision based system which can classify the spam and non-spam emails. This
can be achieved using different machine learning and deep learning and deep learning
algorithms to classify the spam and non-spam emails by accessing the received emails of
an email client. The machine learning approaches and mechanisms like SVM, naive
Bayesian classifier, artificial neural networks and random forests can be of important help
to determine spam emails. After classifying a spam email source a user can navigate, block
and report the source of the spam email generator like spam-bots.
Keywords: machine learning, decision tree, support vector machine (SVM), logistic
regression, artificial neural networks, naive Bayesian classifier and spam-bots.
1 Introduction
A common person can receive a huge amount of emails in a day. The email user can receive
emails from different sources related to the different day to day activities like social networking,
files and sharing, online shopping, e billing, e commerce and applications etc. One should be
able to differentiate between important and useful emails over spam or junk emails. Once a user
gets exposed to the spam and malicious sources he will receive a large amount of emails from
various unknown sources. Therefore it becomes a hectic and time consuming task for an email
user to make a selection and difference of all the received emails, which may contain an
important piece of data or information. The condition becomes very risky when an email client
is trapped into a malicious act and then the security and privacy of a system could be breached.
The email user could be trapped into a phishing act initiated by the cyber criminals. It is very
hard to recover from such situations and most of the times an email user gets attracted to the
spam emails and respond to them. In most of the cases the blocking and reporting of these spam
email sources become useless, as the senders change their location continuously. One of the
alternatives can be tracking those particular IP addresses from where an email user receives
these spam emails, but the task becomes harder when the number of IP addresses are many but
not fewer. And the major part is when the senders change their locations and targets. One of the
ICIDSSD 2020, February 27-28, New Delhi, India
Copyright © 2021 EAI
DOI 10.4108/eai.27-2-2020.2303291