QAR-CIP-NSGA-II: A New Multi-Objective Evolutionary Algorithm to Mine Quantitative Association Rules D. Mart´ ın a , A. Rosete a , J. Alcal´ a-Fdez b,∗ , F. Herrera b a Dept. Artiﬁcial Intelligence and Infrastructure of Informatic Systems, Higher Polytechnic Institute Jos´ e Antonio Echeverr´ ıa, Cujae, 19390 La Habana, Cuba b Department of Computer Science and Artiﬁcial Intelligence, University of Granada, CITIC-UGR, 18071 Granada, Spain Abstract Some researchers have framed the extraction of association rules as a multi-objective problem, jointly optimizing se- veral measures to obtain a set with more interesting and accurate rules. In this paper, we propose a new multi-objective evolutionary model which maximizes the comprehensibility, interestingness and performance of the objectives in order to mine a set of quantitative association rules with a good trade-oﬀ between interpretability and accuracy. To accomplish this, the model extends the well-known Multi-objective Evolutionary Algorithm Non-dominated Sorting Genetic Algorithm II to perform an evolutionary learning of the intervals of the attributes and a condition selection for each rule. Moreover, this proposal introduces an external population and a restarting process to the evolutionary model in order to store all the nondominated rules found and improve the diversity of the rule set obtained. The results obtained over real-world datasets demonstrate the eﬀectiveness of the proposed approach. Keywords: Data Mining, Quantitative Association Rules, Multi-Objective Evolutionary Algorithms, NSGA-II 1. Introduction Association discovery is one of the most common Data Mining (DM) techniques used to extract interesting knowl- edge from large datasets [34]. Association rules identify dependencies between items in a dataset [65] and are deﬁned as an expression of the type X → Y , where X and Y are sets of items and X ∩ Y = ⊘ [1, 2]. Many previous studies for mining association rules focused on datasets with binary or discrete values, however the data in real-world appli- cations usually consists of quantitative values. Thus, designing DM algorithms able to deal with various types of data is a challenge in this ﬁeld [6, 13, 36, 56, 61]. A commonly used method to handle continuous domains in the extrac- tion of association rules is to partition the domains of the attributes in to intervals. For instance, an association rule could be Income ∈ [1200, 2000] → MortgageExpenses ∈ [360, 600]. These kinds of rules are known as quantitative association rules (QARs) [56]. In recent years, Evolutionary Algorithms (EAs), particularly Genetic Algorithms (GAs) [23], have been used by many researchers to mine QARs from datasets with quantitative values [4, 8]. The main motivation for applying GAs to knowledge extraction tasks is that they are robust and adaptive search algorithms that perform a global search in place of candidate solutions (for instance, rules or other forms of knowledge representation). Recently, some researchers have presented the extraction of association rules as a multi-objective problem (instead of single objective), removing some of the limitations of the current approaches. Several objectives are considered in the process of extracting association rules, obtaining a set with more interesting and accurate rules [5, 33]. In this way, we can jointly optimize measures such as support, conﬁdence, and so on, which can present diﬀerent degrees of trade-oﬀ depending on the dataset used and the type of information that can be extracted from it. Since this approach ∗ Corresponding author. Tel +34-958-241000 Ext. 46080 Email addresses: dmartin@ceis.cujae.edu.cu (D. Mart´ ın), rosete@ceis.cujae.edu.cu (A. Rosete), jalcala@decsai.ugr.es (J. Alcal´ a-Fdez ), herrera@decsai.ugr.es (F. Herrera) Preprint submitted to Information Sciences September 5, 2013