Finding All Frequent Patterns Starting from the Closure Mohammad El-Hajj and Osmar R. Za¨ ıane Department of Computing Science, University of Alberta, Edmonton AB, Canada {mohammad, zaiane}@cs.ualberta.ca Abstract. Efficient discovery of frequent patterns from large databases is an active research area in data mining with broad applications in in- dustry and deep implications in many areas of data mining. Although many efficient frequent-pattern mining techniques have been developed in the last decade, most of them assume relatively small databases, leav- ing extremely large but realistic datasets out of reach. A practical and appealing direction is to mine for closed itemsets. These are subsets of all frequent patterns but good representatives since they eliminate what is known as redundant patterns. In this paper we introduce an algo- rithm to discover closed frequent patterns efficiently in extremely large datasets. Our implementation shows that our approach outperforms sim- ilar state-of-the-art algorithms when mining extremely large datasets by at least one order of magnitude in terms of both execution time and memory usage. 1 Introduction Discovering frequent patterns is a fundamental problem in data mining. Many efficient algorithms have been published on this problem in the last 10 years. Most of the existing methods operate on databases made of comparatively small database sizes. Given different small datasets with different characteristics, it is difficult to say which approach would be a winner. Moreover, on the same dataset with different support thresholds different winners could be proclaimed. Differ- ence in performance becomes clear only when dealing with very large datasets. Novel algorithms, otherwise victorious with small and medium datasets, can per- form poorly with extremely large datasets. The question that we ask in this work is whether it is possible to mine efficiently for frequent itemsets in extremely large transactional databases, databases in the order of millions of transactions and thousands of items such as those for big stores and companies similar to Wal- Mart, UPS, etc. With the billions of radio-frequency identification chips (RFID) expected to be used to track and access every single product sold in the mar- ket, the sizes of transactional databases will be overwhelming even to current This research is partially supported by a research grant from the National Sciences and Engineering Research Council of Canada. X. Li, S. Wang, and Z.Y. Dong (Eds.): ADMA 2004, LNAI 3584, pp. 67–74, 2005. c Springer-Verlag Berlin Heidelberg 2005