Mining Top-K Non-Redundant Association Rules Philippe Fournier-Viger 1 and Vincent S. Tseng 2 1 Dept. of Computer Science, University of Moncton, Canada 2 Dept. of Computer Science and Info. Engineering, National Cheng Kung University, Taiwan philippe.fournier-viger@umoncton.ca, tsengsm@mail.ncku.edu.tw Abstract. Association rule mining is a fundamental data mining task. However, depending on the choice of the thresholds, current algorithms can become very slow and generate an extremely large amount of results or generate too few results, omitting valuable information. Furthermore, it is well-known that a large proportion of association rules generated are redundant. In previous works, these two problems have been addressed separately. In this paper, we address both of them at the same time by proposing an approximate algorithm named TNR for mining top-k non redundant association rules. Keywords: association rules, top-k, non-redundant rules, algorithm 1. Introduction Association rule mining [1] consists of discovering associations between sets of items in transactions. It is one of the most important data mining tasks. It has been integrated in many commercial data mining software and has numerous applications [2]. The problem of association rule mining is stated as follows. Let I = {a 1 , a 2 , …a n } be a finite set of items. A transaction database is a set of transactions T={t 1 ,t 2 …t m } where each transaction t j ⊆ I (1≤ j ≤ m) represents a set of items purchased by a customer at a given time. An itemset is a set of items X ⊆ I. The support of an itemset X is denoted as sup(X) and is defined as the number of transactions that contain X. An association rule X →Y is a relationship between two itemsets X, Y such that X, Y ⊆ I and X ∩Y=Ø. The support of a rule X →Y is defined as sup(X →Y) = sup(X ∪Y) / |T|. The confidence of a rule X →Y is defined as conf(X →Y) = sup(X ∪Y) / sup(X). The problem of mining association rules [1] is to find all association rules in a database having a support no less than a user-defined threshold minsup and a confidence no less than a user-defined threshold minconf. For instance, Figure 1 shows a transaction database (left) and some association rules found for minsup = 0.5 and minconf = 0.5 (right). Despite that much research has been done on association rule mining, an important issue that has been overlooked is how users should choose the minsup and minconf thresholds to generate a desired amount of rules [3, 4, 5, 6]. This is an important problem because in practice users have limited resources (time and storage space) for analyzing the results and thus are often only interested in discovering a certain amount of rules, and fine tuning the parameters is time-consuming. Depending on the choice of the thresholds, current algorithms can become very slow and generate an extremely large amount of results or generate none or too few results, omitting valuable information. To address this problem, it was proposed to replace the task of association rule mining with the task of top-k association rules mining, where k is the number of association rules to be found, and is set by the user [3, 4, 5, 6]. Several top-k rule