International Journal of Computer Applications (0975 – 8887) International Conference on Innovations in Computing Techniques (ICICT 2015) 22 Improving E-Mail Spam Classification using Ant Colony Optimization Algorithm D.Karthika Renuka Assistant Professor Department of IT PSG College of Technology P.Visalakshi Professor Department of ECE PSG College of Technology T.Sankar PG Scholar Department of IT PSG College of Technology ABSTRACT In recent days, Electronic mail system is a store and forward mechanism used for the purpose of exchanging documents across computer network through Internet. Spam is an unwanted mail which contains unsolicited and harmful data that are irrelevant to the specified users. In the proposed system, the spam classification is implemented using Naive Bayes classifier, which is a probabilistic classifier based on conditional probability applicable for more complex classification problems. Implementation of feature selection using hybrid Ant Colony Optimization serves to be more efficient which gives good results for the above system that has been proposed in this paper. Keywords E-mail, Spam, Spam classification, Spambase dataset, Naive Bayes classifier 1. INTRODUCTION E-mail is the transmission of messages or documents over electronic networks like the internet. It is a system for receiving, sending and storing electronic messages. E-mail can also be exchanged between online service provider users and in networks other than the Internet, both public and private. It is a method of exchanging digital messages from an author to one or more recipients. 2. E-MAIL SPAM E-mail spam or junk e-mail is one of the major problems of the today’s Internet world, bringing financial damage to companies and annoying individual users. It is sending unwanted e-mail messages with commercial content to indiscriminate set of recipients. Junk mails or spam mails reduces the reliability of these e-mails. Spam detection and classification is the technique to prevent the spam messages. 2.1 Spam Classification Spam classification is that filtering spam e-mail from inbox and moved to the spam e-mail folder. Classification is that splitting up spam and ham mails. The combination of Naive Bayes classifier and Ant Colony Optimization (ACO) algorithm towards spam classification includes two phases: training phase and testing phase, where training phase involves by indexing the two known datasets, which denotes spam and ham mails respectively. The testing phase involves the query indexing and the closest message gets retrieved from the training datasets. The message which gets collected classified by indexing based on the feature set used and the resulting query vector to the vectors will be compared. The message which is closer contained in the spam training set, then that message is classified as spam mail; otherwise it is classified as ham mail. 2.2 Machine Learning Algorithm Machine learning and Knowledge engineering are the two common approaches used in e-mail filtering. In Machine learning is about learning to make predictions from examples of desired behavior or past observations. Machine learning approach is more efficient than knowledge engineering approach; it does not require specifying any rules [2]. Instead, a set of training samples, these samples is a set of pre classified e-mail messages. A specific machine learning algorithm is then used to learn the classification rules from these e-mail messages. There are lots of machine learning algorithms that can be used in e-mail filtering. They include Naive Bayes, Neural Networks, Support Vector Machines, Rough sets and K- nearest neighbor. In e-mail filtering task some features could be the e-mail subject line analysis or the group of words. Thus, the input to e-mail classifier can be viewed as a two dimensional matrix, whose axes are the features and the messages. Then, it classifies e-mail into ham and spam mail using e-mail classifier. 2.3 Feature Selection A feature selection is the process of selecting a subset of important features and removes redundant, irrelevant and noisy features for simpler and more accurate data representation. Feature selection algorithms can commonly be divided into two categories according to the way they process and evaluate features: subset selection methods and feature ranking methods [3]. Subset selection methods search the set of features for the optimal subset. The rank of the feature is determined by a metric and also it eliminates all features that do not achieve an adequate score by means of feature ranking methods. In this research work, feature selection is done using ACO algorithm. 3. OBJECTIVE The main objective of the proposed system is to develop an e- mail spam classification system in an efficient manner. This proposed system aims in classifying the input set e-mails into spam and ham mails. The overall objective in going for this system is to execute the system in faster way as well as better classification performance with more accuracy. 4. SCOPE In the proposed system the spam classification technique is applied to the spam dataset taken. More efficient results will be achieved when it gets applied to the real time data, where real time mail server won’t provide 100% accuracy in classification. The proposed system has a wide range of