ISSN 2394-3777 (Print)
ISSN 2394-3785 (Online)
Available online at www.ijartet.com
International Journal of Advanced Research Trends in Engineering and Technology (IJARTET)
Vol. 9, Issue 8, August 2022
All Rights Reserved © 2022 IJARTET 20
Review Paper on Bio Inspired Metaheuristic
Algorithms for Detecting Spam Email
Naresh Vinod Wankhade, Ranjit. R. Keole, Dhiraj S Kalyankar
Email: wankhadenaresh@gmail.com ,Scholar, DRGIT&R, Amravati,India
Email: ranjitkeole@gmail.com ,Professor & Head of the Department, Information Technology, HVPM’s
CET,Amravati, India
Email: dhiraj.kalyankar50@gmail.com, Head of the Department, Computer Science & Engineer DRGIT&R,Amravati,
India
Abstract: The increasing volume of unsolicited bulk e-
mail (also known as spam) has generated a need for
reliable anti-spam filters. Machine learning techniques
now days used to automatically filter the spam e-mail in a
very successful rate. In this paper we review some of the
most popular machine learning methods (Bayesian
classification, k-NN, ANNs, SVMs, Artificial immune
system and Rough sets) and of their applicability to the
problem of spam Email classification. Descriptions of the
algorithms are presented, and the comparison of their
performance on the Spam Assassin spam corpus is
presented. Electronic mail has eased communication
methods for many organizations as well as individuals.
This method is exploited for fraudulent gain by spammers
through sending unsolicited emails. This article aims to
present a method for detection of spam emails with
machine learning algorithms that are optimized with bio-
inspired methods. A literature review is carried to explore
the efficient methods applied on different datasets to
achieve good results. An extensive research was done to
implement machine learning models using Naïve Bayes,
Support Vector Machine, Random Forest, Decision Tree
and Multi-Layer Perceptron on seven different email
datasets, along with feature extraction and pre-processing.
The bio-inspired algorithms like Particle Swarm
Optimization and Genetic Algorithm were implemented to
optimize the performance of classifiers. Multinomial
Naïve Bayes with Genetic Algorithm performed the best
overall. The comparison of our results with other machine
learning and bio-inspired models to show the best suitable
model is also discussed.
Keywords: ANN, Data Extraction, URL, Machine
Learning, IP Filtration
I. INTRODUCTION
Recently unsolicited commercial / bulk e-mail also
known as spam, become a big trouble over the internet.
Spam is waste of time, storage space and communication
bandwidth. The problem of spam e-mail has been
increasing for years. In recent statistics, 40% of all emails
are spam which about 15.4 billion email per day and that
cost internet users about $355 million per year. Automatic
e-mail filtering seems to be the most effective method for
countering spam at the moment and a tight competition
between spammers and spam-filtering methods is going
on. Only several years ago most of the spam could be
reliably dealt with by blocking e-mails coming from
certain addresses or filtering out messages with certain
subject lines. Spammers began to use several tricky
methods to overcome the filtering methods like using
random sender addresses and/or append random characters
to the beginning or the end of the message subject line
[11]. Knowledge engineering and machine learning are the
two general approaches used in e-mail filtering. In
knowledge engineering approach a set of rules has to be
specified according to which emails are categorized as
spam or ham. A set of such rules should be created either
by the user of the filter, or by some other authority (e.g.
the software company that provides a particular rule-based
spam-filtering tool). By applying this method, no
promising results shows because the rules must be
constantly updated and maintained, which is a waste of
time and it is not convenient for most users. Machine
learning approach is more efficient than knowledge
engineering approach; it does not require specifying any
rules [4]. Instead, a set of training samples, these samples
is a set of pre classified e-mail messages. A specific
algorithm is then used to learn the classification rules from
these e-mail messages. Machine learning approach has
been widely studied and there are lots of algorithms can be
used in e-mail filtering. They include Naïve Bayes,
support vector machines, Neural Networks, K-nearest
neighbor, Rough sets and the artificial immune system.
Machine learning models have been utilized for
multiple purposes in the field of computer science from
resolving a network traffic issue to detecting a malware.
Emails are used regularly by many people for
communication and for socializing. Security breaches that
compromise customer data allows ‘spammers’ to spoof a
compromised email address to send illegitimate (spam)
emails. This is also exploited to gain unauthorized access
to their device by ricking the user into clicking the spam