An Empirical Evaluation of Data Mining Classification Algorithms Prof. Hetal Bhavsar 1 , Dr. Amit Ganatra 2 Assistant Professor, Department of Computer Science and Engineering, The M. S. University of Baroda, Vadodara, Gujarat, India 1 Dean, Faculty of Technology and Engineering, CHARUSAT, Changa, Gujarat, India 2 Abstract: Data Mining is the process of extracting interesting knowledge from large datasets by joining methods from statistics and artificial intelligence with database management. Classification is one of the main functionality in the field of data mining. Classification is the forms of data analysis that can be used to extract models describing important data classes The well known classification methods are Decision tree classification, Neuaral network classification, Naïve Bayes Classification, k-nearest neighbor classification and Support Vector Machine (SVM) classification. In this paper, we present the comparison of five classification algorithms, J48; which is based on C4.5 decision tree based learning, Multilayer perceptron (MLP); uses the multilayer feed forward neural network approach, Instance based K-nearest neighbour (IBK), Naive Bayse (NB), and Sequential Minimal Optimization (SMO); is an extension of support vector machine. Performance of these classification algorithms are compared with respect to classifier accuracy, error rates, building time of classifier and other statistical measures on WEKA tool. The result showed that there is no universal classification algorithm which works better for all the dataset. Keywords: Classification, supervised learning, decision tree, naive bayse, support vector machine I. Introduction: The tremendous amount of information stored in databases and data repository cannot simply be analyzed manually for valuable decision making. Therefore, humans need assistance in their analysis capacity [2]. Such requirement has generated urgent need for automated tools that can assist us in transforming that vast amount of data into useful information and knowledge. Data mining involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data set. Data mining involves an integration of multiple fields including Statistical models, mathematical algorithm, information retrieval, databases, pattern recovery and machine learning methods. Data mining can be done with large number of algorithms and techniques which includes classification, clustering, regression, association mining, artificial intelligence, neural networks, genetic algorithm etc. Classification, one of the main functionality of data mining, can be described as supervised learning algorithm as it assigns class labels to data objects based on the relationship between the data items with a pre- defined class label. The classification techniques are used to learn a model from a set of training data and to classify a test data into one of the classes [1]. WEKA (Waikato Environment for Knowledge Analysis) [3] is an open source data mining tool which includes implementation of various classification algorithm like decision trees, Naïve Bayes, lazy learning, neural network, etc. To observe the performance of the different classification algorithm, this research has conducted a comparison study of J48, MLP, NB, IBK, and SMO algorithms using seven dataset available on UCI dataset repository [4]. The datasets considered for this research are: Breast Cancer, Diabetes, Vote, Car Evaluation, Spambase, Audiology, and Nursery. The rest of the paper is organized as follows: Section 2 covers the related work in this area. Section 3 describes the classification method and its phases. Experimental results and evaluations are presented in section 4. Finally, section 5 gives the conclusion of research. II. Related Work: The top 10 data mining classification algorithms: c4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, KNN, Naive Bayes and CART is described in [5] including their impact and new research issues. Study of a large number of techniques based on Artificial Intelligence, Perceptron-based techniques and Statistics shown that after better understanding of strength and weakness of each method, it is possible to integrate two or more algorithm together to solve a problem [6]. Despite the advantages, these ensemble methods have weaknesses like: increased storage, increased computation, and decreased comprehensibility. In [7], International Journal of Computer Science and Information Security (IJCSIS), Vol. 14, No. 5, May 2016 142 https://sites.google.com/site/ijcsis/ ISSN 1947-5500