2022 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI)
979-8-3503-3384-8/22/$31.00 © 2022 IEEE
Analysis of Spam Messages Using Various Machine
Learning Classifier
Nagaraj. P
Department of Computer Science and
Engineering
Kalasalingam Academy of Research and
Education
Krishnankoil, Virudhunagar, India
nagaraj.p@klu.ac.in
Gopal. R
Department of Information Science and
Engineering
Bannari Amman Institute of Technology
Sathyamangalam, Erode, India
gopalr@bitsathy.ac.in
Sunethra B
Department of Computer Science and
Engineering
Kalasalingam Academy of Research and
Education
Krishnankoil, Virudhunagar, India
sunethraboganatham9@gmail.com
Sumathi. R
Department of Computer Science and
Engineering
Kalasalingam Academy of Research and
Education
Krishnankoil, Virudhunagar, India
r.sumathi@klu.ac.in
Muneeswaran. V
Department of Electronics and
Communication Engineering
Kalasalingam Academy of Research and
Education
Krishnankoil, Virudhunagar, India
munees.klu@gmail.com
Vignesh. K
Department of Computer Science and
Engineering
Kalasalingam Academy of Research and
Education
Krishnankoil, Virudhunagar, India
vignesh.k@klu.ac.in
Abstract—
Background:
As people using social media increases the data generation also
increases and the data generated may be safe or unsafe. If we
see some applications like Twitter and mail. We get a lot of
emails or twits that include all dangerous and useful things.
Here to be safe from the threats and dangers we need a filter
that separates useful messages from spam and helps us not to
drown in a trap. And one of the approaches to do this is
explained in this paper. In this paper, the algorithm followed is
the Naïve Bayes classifier. This also provides the comparison
between using Naïve Bayes, KNN, and Logistic Regression to
solve the same problem that is spam filtering and term
frequency-inverse document frequency (TFIDF).
Keywords— Machine Learning, Naïve Bayes Classifier, Bag
of Words, K nearest neighbours, Logistic Regression
I. INTRODUCTION
Spam information may come in any form that is through
messages through mail or through SMS, nowadays this spam
is growing due to the increase in users over the internet. Most
of the spam or not useful information we get is through the
internet as all the applications now a day’s work. This spam
may be of any type like the spam that attacks devices and
spread the virus, the spam that tries to steal money, spam that
fools the users by spreading wrong information, the spam that
attracts the users with false information, and more [1].
Nowadays people who generate this spam also became very
intelligent that they are creating them like, if we click on the
link or message or mail the malware spreads automatically.
so that we will not have any chance to at least read and check
whether that is spam or not.
So, in those situations, we can’t get any option and at that
time this spam filtering helps us to segregate spam
information and saves us from danger [2]. So that it is very
helpful to us to get rid of any danger caused by the data.
As technology is growing day by day, there are both boons
and banes because of it. We can do our work fast and share
our views and tasks also become simple. Mails play a major
role in our life. In this society, there will be no people who
are not using mail. These emails act as an interface between
people to communicate, interact and share their views.
Most of the official things are done through the mail. Many
industries and organizations use mail services to
communicate with their employees, and mail usage
increased, and in this pandemic time, even people from a kid
to old all are using mail.
Due to this spam, their mails may affect and people who are
not much aware of this type of falsies spreading may think
that the spam emails are also useful and not dangerous. For
the people like this spam filtering helps much to make them
aware that the mails in spam are dangerous and that affects
the devices and some may affect financially also.
So many people are affected by this spam. And if we try to
get awareness in the people that is also not possible because
sometimes by clicking on the link itself, we get to lose what
we have not ever thought of [2].so by classifying them into
spam we can give an idea to the user that these are spam and
take care while opening this. This cannot solve the problem
fully but up to some extent, this helps users to get escape from
this type of threat and dangers.
In this paper, spam detection is done on the spam/ham
dataset. This paper also gives an idea about the all algorithms
that can be used for classification.
Here three algorithms are used namely Naïve Bayes
Classifier which has been published by many papers and
works very well for spam filtering. It is one of the best
algorithms that is used for spam filtering nearest neighbor’s
classifier is the second algorithm that has been used in this
paper this has also given good results but not as much as the
naïve Bayes. This is also clearly explained in the paper.
Logistic Regression is the third algorithm that has been used
in the paper. This gives us a line that classifies the spam and
non-spam text.
2022 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI) | 979-8-3503-3384-8/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICDSAAI55433.2022.10028952
Authorized licensed use limited to: Charles Darwin University. Downloaded on February 06,2023 at 14:38:07 UTC from IEEE Xplore. Restrictions apply.