A Method for Mining Quantitative Association Rules MARÍA N. MORENO, SADDYS SEGRERA, VIVIAN F. LÓPEZ AND M. JOSÉ POLO Department of Computing and Automatic University of Salamanca Plaza Merced s/n, 37008 Salamanca SPAIN Abstract: Association rule mining is a significant research topic in the knowledge discovery area. In the last years a great number of algorithms have been proposed with the objective of solving diverse drawbacks presented in the generation of association rules. One of the main problems is to obtain interesting rules from continuous numeric attributes. In this paper, a method for mining quantitative association rules is proposed. It deals with the problem of discretizing continuous data in order to discover a manageable number of high confident association rules, which cover a high percentage of examples in the data set. The method was validated by applying it to data from software project management metrics. Key-Words: Association rules, discretization, clustering 1 Introduction Association analysis is a useful data mining technique exploited in multiple application domains. One of the best known is the business field where the discovering of purchase patterns or associations between products that clients tend to buy together is used for developing an effective marketing. The attributes used in this domain are mainly categorical data, which simplifies the procedure of mining the rules. In the last years the application areas involving other types of attributes have increased significantly. Some examples of recent applications are finding patterns in biological databases, extraction of knowledge from software engineering metrics [14] or obtaining user's profiles for web system personalization [15] [16]. Associative models have been even used in classification problems as the base of some efficient classifiers [11] [16]. Numerous methods for association rule mining have been proposed, however many of them discover too many rules, which represent weak associations and uninteresting patterns. The improvement of association rules algorithms is the subject of many works in the literature. Most of the research efforts have been oriented to simplify the rule set, to generate strong and interesting patterns as well as to improve the algorithm performance. When attributes used for inducing the rules take continuous values, these three objectives can be achieved by means of an efficient data discretization procedure such as the proposed in this paper. The strength of an association rule in the form “If X then Y” is mainly quantified by the following factors: • Confidence or predictability. A rule has confidence c if c% of the transactions in D that contain X also contain Y. A rule is said to hold on a dataset D if the confidence of the rule is greater than a user- specified threshold. • Support or prevalence. The rule has support s in D if s% of the transactions in D contain both X and Y. The interestingness issue refers to finding rules that are interesting and useful to users [12]. It can be assessed by means of objective measures such as support (statistical significance) and confidence (goodness), defined before, but subjective measures are also needed. Liu et al. [12] suggest the following ones: • Unexpectednes: Rules are interesting if they are unknown to the user or contradict the user’s existing knowledge. • Actionability: Rules are interesting if users can do something with them to their advantage. Actionable rules are either expected or unexpected, but the last ones are the most interesting rules due to they are unknown for the user and lead to more valuable decisions. Most of the approaches for finding interesting rules in a subjective way require the user participation to articulate his knowledge or to express what rules are interesting for him. Unfortunately these subjective factors cannot be easily obtained in some application areas such as project management, especially when a large number of quantitative attributes are involved Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization, Lisbon, Portugal, September 22-24, 2006 173