Association Rule and Decision Tree based Methods for Fuzzy Rule Base Generation Ferenc Peter Pach and Janos Abonyi Pannon University, Department of Process Engineering, Veszprem, P.O. Box 158, H-8201, Hungary, http://www.fmt.vein.hu/softcomp abonyij@fmt.vein.hu Abstract— This paper focuses on the data-driven generation of fuzzy IF...THEN rules. The resulted fuzzy rule base can be applied to build a classifier, a model used for prediction, or it can be applied to form a decision support system. Among the wide range of possible approaches, the decision tree and the association rule based algorithms are overviewed, and two new approaches are presented based on the a priori fuzzy clustering based partitioning of the continuous input variables. An application study is also presented, where the developed methods are tested on the well known Wisconsin Breast Cancer classification problem. I. I NTRODUCTION Human logic can be represented well by logical expressions in syntax of rules, with an antecedent and a consequent part. A short example can be: If somebody has forgotten her/his umbrella at home and it is pouring with rain then the chances are that she/he will be flooding. The set of logical rules is called rule base that is an easy and useful interpretation of the knowledge of a given area. ”Various types of logical rules can be discussed in the context of the decision borders these rules create in multidimensional feature space. The standard crisp propositional IF...THEN rules provide overlapping hyperrect- angular covering areas, threshold logic rules are equivalent to separating hyperplanes, while fuzzy rules based on real-valued predicate functions” (come from the prolog to [52]). Accordingly many rule based methods have been developed for extraction knowledge from databases. The paper [40] introduces a genetic programming (GP) and fuzzy logic based algorithm that extracts explanatory rules from micro array data. A hybrid approach is proposed in [7], where a standard GP and a heuristic hierarchical crisp rule-base construction are combined. A fuzzy mining algorithm based on Srikant and Agrawals method [48] is proposed for extracting generalized rules with the use of taxonomies [51]. In [34] compact fuzzy rules extraction is based on adaptive data approximation using B-splines. Rule bases are efficiently used in many area but this paper concentrates first of all to the prediction applications. Rule bases are successfully applied for example in stock exchange estimation [37], weather [32] or future sales forecasting [19]. The high prediction accuracy of the applied model (build from the extracted rules) is very important but the model understanding could be also very critical in many areas. It is very useful to know what are in the background of the decisions, while rules could be edited or changed by the specialists of the application area. The compact and appre- hensible predictive models via the visualization possibilities could help better human decisions. The paper [52] shows many computational intelligence techniques (based on decision trees, neural networks, etc.) that very useful tools to rule extraction and data understanding. In developments of the new rule based methods for pre- diction applications besides the retention and enhancement of achieved accuracies (in the classification problems), the one of the most important objects is to enlarge the interpretable of the rules. To take this aspect into account the one of the possible improvement ways is the adaptation of fuzzy logic. Besides the fuzzy methods could represent the discovered rules far natural for human, the fuzzy logic serves more robust predictive models (classifiers) in case of false, inconsistent, and missing data. In this paper a fuzzy decision tree (Section II-B) and a fuzzy association rule based method (Section III-B) are introduced for fuzzy rule base generation. Our main goal is to show how construct compact fuzzy rule bases which can be used for data analysis, classification, or prediction. Therefore prediction accuracy (for classification problems) and understanding are together in focus during the rule extraction steps in both algorithms. The classification effectiveness of the proposed methods are tested on the Wisconsin Breast Cancer problem. The results are summarized in a short application study (Section IV). II. FUZZY DECISION TREE BASED METHODS A. Existent decision tree induction algorithms Decision tree based methods are widely used in data mining and decision support applications. Decision tree is fast and easy to use for rule generation and classification problems, moreover it is an excellent representation tool of decisions. The popularity and the spread of decision tree are based on the algorithm ID3 by Quinlan [46]. Many studies had been written to induction and analysis of decision trees [54], [47], [35], [36], [55]. The application areas of decision trees are also very breadth [6], [45], [15], [50], [49], [38]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 13 MAY 2006 ISSN 1307-6884 PWASET VOLUME 13 MAY 2006 ISSN 1307-6884 45 © 2006 WASET.ORG