An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H-12 Islamabad, Pakistan ABSTRACT Internet usage has become intensive during the last few decades; this has given rise to the use of email which is one of the fastest yet cheap modes of communication. The growing demand of email communication has given rise to the spam email which is also known as unsolicited mails. In this paper we propose an ensemble model that uses majority voting on top of several classifiers to detect spam. The classification algorithms used for this purpose are Naïve Bayesian, Support Vector Machines, Random Forest, Decision Stump and k- Nearest Neighbor. Majority voting generates the final decision of the ensemble by obtaining major votes from the classifiers. The sample dataset used for this task is taken from UCI and the tool Rapidminer is used for the validation of the results. KEYWORDS Spam email, filtering, Naïve Bayesian, SVM, Random Forest, Decision tree, Rapidminer 1 INTRODUCTION Internet usage has become intensive during the last few decades; this has given rise to the use of email which is one of the fastest yet cheap modes of communication. However the rise of email and internet users resulted in the striking increase of unsolicited bulk/spam emails. Spam emails are the junk emails that are sent to numerous undisclosed recipients and that contains identical messages for everyone. Usman Qamar Faculty, Department of Computer Engineering National University of Science and Technology, H-12 Islamabad, Pakistan Botnet, which is group of programs communicating with other similar programs, is specifically used to send spam emails and it is known for its malicious implication. The enormous amount of spam data effects the Information Technology based businesses and brings loss of billions of Dollars to the organizations in terms of its output [1]. In last few years, spam emails have become a source for intruding the sensitive data and this posed a serious threat to the sanctuary of many departments [2]. Researchers used classification that focuses on three levels of the email i.e. email address, subject line and body contents. Content based spam detection is the most effective of all three. The aim of this paper is to propose an ensemble that uses majority voting approach in combination with filtering algorithms for spam detection. 1.1 Spam Features Spam emails have following features [3], the emails are sent to undisclosed recipients for the advertisement of services/products/offensive material. The aim is to deceive innocent people by gaining personal data of the masses and abuse it. Majority of the spam emails do not offer unsubscribe option. Proceedings of the International Conference on Data Mining, Internet Computing, and Big Data, Kuala Lumpur, Malaysia, 2014 ISBN: 978-1-941968-02-4 ©2014 SDIWC 76