1 Abstract—Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy. Keywords—Classification, data mining, spam filtering, naive Bayes, decision tree. I. INTRODUCTION ATA Mining allowed the development of a new research field “The Big Data”. The term “Big Data” is the successor of “information explosion” term. The “Big Data” was appeared for the first time by John Mashey in 1998 [1]. “Big data” refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze [2]. This new field tries to answer how a huge number of databases and information repositories could be organized, analyzed and how it is possible to retrieve information from this data. It is obviously these questions generate an eminent need of methods that can help users to efficiently navigate, summarize, and organize the data so that it can further be used for applications ranging from market analysis, fraud detection [3]. The Internet development involves the new technics of data storage on distant server called clouds. The emails are used so that the total email traffic worldwide, including emails professionals and individuals was estimated at over 144 billion emails per day at the end of the year 2012. It is also expected that the amount of mail traffic reaches more than 192 billion e-mails a day in 2016 [4]. Some of these e-mails are promotions and could be considered as not interesting therefore as SPAMS. In this paper, we analyze some known data results may uncover important data patterns are needed. Thanh Nguyen and Andrei Doncescu are with the University of Toulouse, Toulouse, France (e-mail: tnguyen@laas.fr, andrei.doncescu@laas.fr). Pierre Siegel is with the LIF-AIX Marseille University, Marseille, France (e-mail: pierre.siegel@cim.univ-mrs.fr). II. DATA MINING Data mining is an analytical process designed for extracting or exploring hidden and predictive information from large databases. It can also be described as the process of searching for valuable information in large volumes of data [5]. Data mining is a form of knowledge discovery essential for solving problems in a specific domain, means a process of nontrivial extraction of implicit, previously unknown and potentially useful information from data in databases [6]. Data mining is widely used in diverse areas like Financial Data Analysis, Telecommunication Industry, Biological Data Analysis, Intrusion Detection and other Scientific Applications. Data mining refers to the analysis and extracts knowledge from the large quantities of data that are stored in computers, network and internet [3]. Data mining should be applicable to any kind of information repository from simple numerical measurements and text documents, to more complex information such as spatial data, multimedia channels, hypertext documents, relational databases, object-relational databases, object oriented databases, data warehouses, transaction databases, unstructured and semi-structured repositories such as the World Wide Web, multimedia databases, time-series databases etc. [7]. These functions of data mining are mainly classified as include clustering, classification, prediction, associations and sequential patterns [8]. In this paper, we focus research on the Spam data classification and the performance measure of the two classifier algorithms ADTree and Naive Bayes based on True Position Rate (TP Rate), False Position Rate (FP Rate) generated by the algorithms when applied on the Spambase data set. III. SPAM CLASSIFIERS Classification consists of predicting a certain outcome based on a given input. In order to predict the outcome, the algorithm processes a training set containing a set of attributes and the respective outcome called prediction attribute. The algorithm analyses relationships between the attributes that would make it possible to predict the outcome. Next the algorithm is given a data set not seen before, called prediction set, which contains the same set of attributes, except for the prediction attribute not yet known. The algorithm analyses the input and produces a prediction [9] In this section, it is presented two types of algorithm: Naive bayes classifiers algorithm and ADTree decision tree algorithm in the view of comparison. The comparison is made on accuracy, sensitivity and specificity using true positive and Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering Thanh Nguyen, Andrei Doncescu, Pierre Siegel D World Academy of Science, Engineering and Technology International Journal of Mathematical and Computational Sciences Vol:10, No:5, 2016 269 International Scholarly and Scientific Research & Innovation 10(5) 2016 ISNI:0000000091950263 Open Science Index, Mathematical and Computational Sciences Vol:10, No:5, 2016 publications.waset.org/10004544/pdf