Intelligent Data Analysis 4 (2000) 229–240 229 IOS Press Reducing redundancy in characteristic rule discovery by using integer programming techniques Tom Brijs ∗ , Koen Vanhoof and Geert Wets Department of Applied Economics, Limburg University Centre, B-3590 Diepenbeek, Belgium E-mail: {tom.brijs, koen.vanhoof, geert.wets}@luc.ac.be Received 20 October 1999 Revised 2 December 1999 Accepted 12 December 1999 Abstract. The discovery of characteristic rules is a well-known data mining task and has lead to several successful applications. However, because of the descriptive nature of characteristic rules, typically a (very) large number of them is discovered during the mining stage. This makes monitoring and control of these rules, in practice, extremely costly and difficult. Therefore, a selection of the most promising subset of rules is desirable. Some heuristic rule selection methods have been proposed in the literature that deal with this issue. In this paper, we propose an integer programming model to solve the problem of optimally selecting the most promising subset of characteristic rules. Moreover, the proposed technique enables to control a user-defined level of overall quality of the model in combination with a maximum reduction of the redundancy extant in the original ruleset. We use real-world data to empirically evaluate the benefits and performance of the proposed technique against the well-known RuleCover heuristic. Results demonstrate that the proposed integer programming techniques are able to significantly reduce the number of retained rules and the level of redundancy in the final ruleset. Moreover, the results demonstrate that the overall quality in terms of the discriminant power of the final ruleset slightly increases if integer programming methods are used. Keywords: Redundancy reduction, rule selection, characteristic rules, artificial intelligence 1. Introduction Data mining is the automated search for hidden, previously unknown and potentially useful information from large databases. Moreover, data mining is a crucial phase in the KDD (Knowledge Discovery in Databases) process [7]. In fact, two important goals of KDD can be identified, more specifically prediction, i.e. the use of training data to construct a model to predict unknown values of future instances, and description, i.e. the search for interesting patterns and their (re)presentation in an easy, human understandable format. In this paper, we are primarily interested in the latter objective, namely description, however, without neglecting the former objective, i.e. predictive power. One of the most well-known data mining tasks to extract descriptive information from data is the discovery of characteristic rules. Briefly, characteristic rules express characteristics or properties of a * Corresponding author. Tel.: +32 11 268621; http://hyper.luc.ac.be. 1088-467X/00/$8.00 2000 – IOS Press. All rights reserved