Knowledge and Information Systems https://doi.org/10.1007/s10115-018-1271-1 REGULAR PAPER A modified content-based evolutionary approach to identify unsolicited emails Shrawan Kumar Trivedi 1 · Shubhamoy Dey 2 Received: 12 May 2017 / Revised: 7 December 2017 / Accepted: 26 May 2018 © Springer-Verlag London Ltd., part of Springer Nature 2018 Abstract This computational research seeks to classify unsolicited versus legitimate emails. A modified version of an existing genetic programming (GP) classifier—i.e., modified genetic program- ming (MGP)—is implemented to build an ensemble of classifiers to identify unsolicited emails. The proposed classifier is assessed using informative features extracted from two corpora (Enron and SpamAssassin) with the help of the greedy stepwise feature search method. Further, a comparative study is performed with other popular classifiers, such as Bayesian network, naïve Bayes, decision tree, random forest (RF), support vector machine (SVM), and GP. Further the results are validated with 20-fold cross-validation and paired T test. The results prove that the proposed classifier performs better in terms of accuracy and false-positive detection in comparison with the other machine learning classifiers tested in this study. Using different training and testing a set of email files from the Enron cor- pus, ensemble-based classifiers, such as boosted SVM, boosted Bayesian, boosted naïve Bayesian, RF, and the proposed MGP classifier, are tested and compared on all metrics, including training and testing time. The findings suggest that the MGP classifier with the greedy stepwise feature search method offers an improvement over alternative methods in detecting unsolicited emails. Keywords Modified genetic programming · Machine learning classifiers · Unsolicited emails · Ensemble · Accuracy · F value · False-positive rate · Training and testing time 1 Introduction In today’s automated world, information sharing between organizations and their units is necessary to create a competitive and sustainable business environment. Email is an impor- tant tool for rapid and economical communication; however, spam (unsolicited email) is seen B Shrawan Kumar Trivedi shrawan@iimisirmaur.ac.in Shubhamoy Dey shubhamoy@iimidr.ac.in 1 Indian Institute of Management Sirmaur, Sirmaur, HP, India 2 Indian Institute of Management Indore, Indore, MP, India 123