A new approach to classification based on association rule mining Guoqing Chen * , Hongyan Liu, Lan Yu, Qiang Wei, Xing Zhang Department of Management Science and Engineering, School of Economics and Management, Tsinghua University, Beijing 100084, China Received 19 February 2004; received in revised form 9 March 2005; accepted 9 March 2005 Available online 25 July 2005 Abstract Classification is one of the key issues in the fields of decision sciences and knowledge discovery. This paper presents a new approach for constructing a classifier, based on an extended association rule mining technique in the context of classification. The characteristic of this approach is threefold: first, applying the information gain measure to the generation of candidate itemsets; second, integrating the process of frequent itemsets generation with the process of rule generation; third, incorporating strategies for avoiding rule redundancy and conflicts into the mining process. The corresponding mining algorithm proposed, namely GARC (Gain based Association Rule Classification), produces a classifier with satisfactory classification accuracy, compared with other classifiers (e.g., C4.5, CBA, SVM, NN). Moreover, in terms of association rule based classification, GARC could filter out many candidate itemsets in the generation process, resulting in a much smaller set of rules than that of CBA. D 2005 Elsevier B.V. All rights reserved. Keywords: Data mining; Association rule; Classification; Information gain 1. Introduction Classification is one of the key issues in the field of decision sciences, a field which plays an important role in supporting business and scientific decision-making. In recent years, it has also been one of the focal points in data mining and knowledge discovery. Classifica- tion is finding a classifier that results from training datasets with predetermined targets, fine-tuning it with test datasets, and using it to classify other datasets of interest. There exists various ways of constructing classifiers in the form of, for example, rules, decision trees, Bayesian networks, support vectors machine, etc. [12,14–16,21,24,26,29–31]. Decision trees classi- fiers, such as Quinlan’s C4.5/5.0 classifier and its extensions [30], have received considerable attention due to its speed and understandability. Moreover, a number of efforts have been put forward to focus on the various aspects of improvements [5,9,25,33]. An- other type of classification technique that has attracted an increasing number of attempts in recent years is finding classification rules based on association rule mining techniques, e.g., Refs. [4,20–23,29]. A classification rule is of the form X Z C, where X is a set of data items, and C is a class (label) and a predetermined target. With such a rule, a transaction or data record t in a given database could be classified into class C if t contains X. Apparently, a classifica- 0167-9236/$ - see front matter D 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.dss.2005.03.005 * Corresponding author. Tel.: +86 10 62772940; fax: +86 10 62785876. E-mail address: chengq@em.tsinghua.edu.cn (G. Chen). Decision Support Systems 42 (2006) 674 – 689 www.elsevier.com/locate/dsw