Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 1 NMEEF-SD: Non-dominated Multi-objective Evolutionary algorithm for Extracting Fuzzy rules in Subgroup Discovery Crist´ obal J. Carmona, Pedro Gonz´ alez, Mar´ ıa Jos´ e del Jesus, and Francisco Herrera, Member, IEEE Abstract—A Non-dominated Multi-objective Evolutionary Al- gorithm for Extracting Fuzzy rules in Subgroup Discovery (NMEEF-SD) is described and analysed in this paper. This algorithm, based on the hybridisation between fuzzy logic and genetic algorithms, deals with subgroup discovery problems. In order to extract interesting, novel and interpretable fuzzy rules. The evolutionary fuzzy system NMEEF-SD is based on the well- known NSGA-II model, but is oriented towards the subgroup discovery task using specific operators to promote the extraction of interpretable and high quality subgroup discovery rules. The proposal includes different mechanisms to improve diversity in the population, and permits the use of different combinations of quality measures in the evolutionary process. An elaborate experimental study, reinforced by the use of nonparametric tests, was performed to verify the validity of the proposal, comparing the proposal with other subgroup discovery methods. The results show that NMEEF-SD obtains the best results among several algorithms studied. Index Terms—Descriptive rule induction, genetic fuzzy system, multi-objective evolutionary algorithm, subgroup discovery, fuzzy rules. I. I NTRODUCTION D ATA mining displays supervised as well as non- supervised learning approaches. Generally, supervised learning methods have a predictive nature, while non- supervised ones have a descriptive nature. Currently, sev- eral techniques are located halfway between descriptive and predictive data mining, such as subgroup discovery (SD) [1], contrast set mining [2], and emerging pattern mining [3], which has aroused the interest of researchers. These techniques are known as “Supervised Descriptive Rule Induction” [4] because they combine the features of both types of induction and their main objective is to extract descriptive knowledge from the data concerning a property of interest. This paper focuses on SD, a form of supervised inductive learning of subgroup descriptions in which, given a set of data and having a property of interest to the user, the algorithm attempts to locate subgroups which are “most interesting” for the user. SD has the objective to discover interesting properties C.J. Carmona, P. Gonz´ alez, and M.J. del Jesus are with the Depart- ment of Computer Science, University of Jaen, 23071 Jaen, Spain (e-mail: {ccarmona|pglez|mjjesus}@ujaen.es). F. Herrera is with the Department of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain (email: her- rera@decsai.ugr.es). This work was supported by the Spanish Ministry of Education, Social Policy and Sports under projects TIN-2008-06681-C06-01 and TIN-2008- 06681-C06-02, and by the Andalusian Research Plan under project TIC-3928. of subgroups obtaining simple rules, with high generality, accuracy and interest. Nowadays, SD is being applied to problems in a variety of fields such as medicine [5], [6], marketing [7] and e-learning [8]. In recent years, new algorithms for SD have been developed using soft-computing techniques such as fuzzy rules [9] and genetic algorithms (GAs) [10]. The conjunction of these techniques is called genetic fuzzy systems (GFSs) [11], [12], which has incited considerable attention in the computational intelligence community. Several useful tools are provided for SD task, see for instance KEEL Data Mining tool [13]. The induction of rules describing subgroups can be consid- ered a multi-objective problem, since there are different quality measures which can be used for the evaluation of an SD rule. Therefore, multi-objective evolutionary algorithms (MOEAs) are adapted to solve problems in which different objectives must be optimized [14], [15]. In particular, NSGA-II [16] is a high quality exponent of this type of algorithm, widely used in GFSs [17]. This paper describes a proposal based on the NSGA-II approach for the induction of fuzzy rules which describe subgroups: Non-dominated Multi-objective Evolutionary al- gorithm for Extracting Fuzzy rules in Subgroup Discovery (NMEEF-SD). As a novelty this algorithm permits the selec- tion of different combinations of quality measures as objec- tives in the evolutionary process, and introduces an operator to promote diversity in the process. In order to verify the validity of the model presented, an elaborate experimental study of SD was performed for the evolutionary SD algorithms NMEEF-SD, SDIGA [7] and MESDIF [18], and the classical SD algorithms CN2-SD [1] and Apriori-SD [19]. These studies were reinforced by the use of nonparametric tests for comparison and show good results in the quality measures analysis and in the interpretable analysis obtained by NMEEF-SD. Furthermore, an analysis of scalability and time complexity is performed between NMEEF-SD, CN2-SD and Apriori-SD. The paper is organised as follows: Section II provides a short description of SD, the quality measures used and a presentation of the GFS for SD. Section III describes the proposed NMEEF-SD algorithm. Section IV discusses the tests conducted on the data sets for the compared algorithms: Subsection IV-A shows the experimental framework; Subsec- tion IV-B contains the study with different combinations of quality measures for NMEEF-SD; Subsection IV-C includes a study of the evolutionary algorithms for SD; Subsection IV-D Authorized licensed use limited to: UNIVERSIDAD DE GRANADA. Downloaded on August 03,2010 at 14:39:17 UTC from IEEE Xplore. Restrictions apply.