Minimal k-Free Representations of Frequent Sets Toon Calders 1 and Bart Goethals 2 1 University of Antwerp, Belgium 2 Helsinki Institute for Information Technology, Finland Abstract. Due to the potentially immense amount of frequent sets that can be generated from transactional databases, recent studies have demonstrated the need for concise representations of all frequent sets. These studies resulted in several successful algorithms that only generate a lossless subset of the frequent sets. In this paper, we present a unifying framework encapsulating most known concise representations. Because of the deeper understanding of the different proposals thus obtained, we are able to provide new, provably more concise, representations. These theo- retical results are supported by several experiments showing the practical applicability. 1 Introduction The frequent itemset mining problem is by now well known [1]. We are given a set of items I and a database D of subsets of I . The elements of D are called transactions. An itemset I ⊆I is some set of items; its support in D, denoted support (I, D), is defined as the number of transactions in D that contain all items of I . An itemset is called s-frequent in D if its support in D exceeds s. The database D and the minimal support s are omitted when they are clear from the context. The goal is now, given a minimal support threshold and a database, to find all frequent itemsets. The set of all frequent itemsets is denoted F (D,s), the set of infrequent sets is denoted F (D,s). Recent studies on frequent itemset mining algorithms resulted in significant performance improvements. However, if the minimal support threshold is set too low, or the data is highly correlated, the number of frequent itemsets itself can be extremely large. To overcome this problem, recently several proposals have been made to construct a concise representation [13] of the frequent itemsets, instead of mining all frequent itemsets: Closed sets [2, 4, 14–16], Free sets [5], Disjunction-Free Sets [6, 10], Generalized Disjunction-Free Generators [12, 11], and Non-Derivable Itemsets [8]. A Concise Representation of frequent sets is a subset of all frequent sets with their supports that contains enough information to construct all frequent sets with their support. Therefore, based on the representation, for each itemset I , we must be able to (a) decide whether I is frequent, and (b) if I is frequent, produce its support. Mannila et al. [13] introduced the notion of a concise representation in a more general context. Our definition resembles theirs, but for reasons of simplicity we only concentrate on representations that are exact, and for frequent itemsets.