Hybrid decision tree and naïve Bayes classiﬁers for multi-class classiﬁcation tasks Dewan Md. Farid a , Li Zhang a,⇑ , Chowdhury Moﬁzur Rahman b , M.A. Hossain a , Rebecca Strachan a a Computational Intelligence Group, Department of Computer Science and Digital Technology, Northumbria University, Newcastle upon Tyne, UK b Department of Computer Science & Engineering, United International University, Bangladesh article info Keywords: Data mining Classiﬁcation Hybrid Decision tree Naïve Bayes classiﬁer abstract In this paper, we introduce two independent hybrid mining algorithms to improve the classiﬁcation accuracy rates of decision tree (DT) and naïve Bayes (NB) classiﬁers for the classiﬁcation of multi-class problems. Both DT and NB classiﬁers are useful, efﬁcient and commonly used for solving classiﬁcation problems in data mining. Since the presence of noisy contradictory instances in the training set may cause the generated decision tree suffers from overﬁtting and its accuracy may decrease, in our ﬁrst proposed hybrid DT algorithm, we employ a naïve Bayes (NB) classiﬁer to remove the noisy troublesome instances from the training set before the DT induction. Moreover, it is extremely computationally expen- sive for a NB classiﬁer to compute class conditional independence for a dataset with high dimensional attributes. Thus, in the second proposed hybrid NB classiﬁer, we employ a DT induction to select a comparatively more important subset of attributes for the production of naïve assumption of class con- ditional independence. We tested the performances of the two proposed hybrid algorithms against those of the existing DT and NB classiﬁers respectively using the classiﬁcation accuracy, precision, sensitivity- speciﬁcity analysis, and 10-fold cross validation on 10 real benchmark datasets from UCI (University of California, Irvine) machine learning repository. The experimental results indicate that the proposed methods have produced impressive results in the classiﬁcation of real life challenging multi-class prob- lems. They are also able to automatically extract the most valuable training datasets and identify the most effective attributes for the description of instances from noisy complex training databases with large dimensions of attributes. Ó 2013 Elsevier Ltd. All rights reserved. 1. Introduction During the past decade, a sufﬁcient number of data mining algorithms have been proposed by the computational intelligence researchers for solving real world classiﬁcation and clustering problems (Farid et al., 2013; Liao, Chu, & Hsiao, 2012; Ngai, Xiu, & Chau, 2009). Generally, classiﬁcation is a data mining function that describes and distinguishes data classes or concepts. The goal of classiﬁcation is to accurately predict class labels of instances whose attribute values are known, but class values are unknown. Clustering is the task of grouping a set of instances in such a way that instances within a cluster have high similarities in comparison to one another, but are very dissimilar to instances in other clus- ters. It analyzes instances without consulting a known class label. The instances are clustered based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity. The performance of data mining algorithms in most cases depends on dataset quality, since low-quality training data may lead to the construction of overﬁtting or fragile classiﬁers. Thus, data prepro- cessing techniques are needed, where the data are prepared for mining. It can improve the quality of the data, thereby helping to improve the accuracy and efﬁciency of the mining process. There are a number of data preprocessing techniques available such as (a) data cleaning: removal of noisy data, (b) data integration: merging data from multiple sources, (c) data transformations: nor- malization of data, and (d) data reduction: reducing the data size by aggregating and eliminating redundant features. This paper presents two independent hybrid algorithms for scaling up the classiﬁcation accuracy of decision tree (DT) and naïve Bayes (NB) classiﬁers in multi-class classiﬁcation problems. DT is a classiﬁcation tool commonly used in data mining tasks such as ID3 (Quinlan, 1986), ID4 (Utgoff, 1989), ID5 (Utgoff, 1988), C4.5 (Quinlan, 1993), C5.0 (Bujlow, Riaz, & Pedersen, 2012), and CART (Breiman, Friedman, Stone, & Olshen, 1984). The goal of DT is to create a model that predicts the value of a target class for an un- seen test instance based on several input features (Loh & Shih, 1997; Safavian & Landgrebe, 1991; Turney, 1995). Amongst other data mining methods, DTs have various advantages: (a) simple to understand, (b) easy to implement, (c) requiring little prior 0957-4174/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2013.08.089 ⇑ Corresponding author. Tel.: +44 191 243 7089. E-mail address: li.zhang@northumbria.ac.uk (L. Zhang). Expert Systems with Applications 41 (2014) 1937–1946 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa