The VLDB Journal (2008) 17:1321–1344 DOI 10.1007/s00778-007-0078-6 REGULAR PAPER Mining top-k frequent patterns in the presence of the memory constraint Kun-Ta Chuang · Jiun-Long Huang · Ming-Syan Chen Received: 16 January 2006 / Revised: 11 March 2007 / Accepted: 8 August 2007 / Published online: 7 November 2007 © Springer-Verlag 2007 Abstract We explore in this paper a practicably interest- ing mining task to retrieve top-k (closed ) itemsets in the presence of the memory constraint. Specifically, as opposed to most previous works that concentrate on improving the mining efficiency or on reducing the memory size by best effort, we first attempt to specify the available upper mem- ory size that can be utilized by mining frequent itemsets. To comply with the upper bound of the memory consump- tion, two efficient algorithms, called MTK and MTK_Close, are devised for mining frequent itemsets and closed item- sets, respectively, without specifying the subtle minimum support. Instead, users only need to give a more human- understandable parameter, namely the desired number of frequent (closed ) itemsets k . In practice, it is quite chal- lenging to constrain the memory consumption while also efficiently retrieving top-k itemsets. To effectively achieve this, MTK and MTK_Close are devised as level-wise search algorithms, where the number of candidates being generated- and-tested in each database scan will be limited. A novel search approach, called δ-stair search, is utilized in MTK and MTK_Close to effectively assign the available memory for testing candidate itemsets with various itemset-lengths, which leads to a small number of required database scans. As demonstrated in the empirical study on real data and K.-T. Chuang (B ) · M.-S. Chen Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, ROC e-mail: doug@arbor.ee.ntu.edu.tw M.-S. Chen e-mail: mschen@cc.ee.ntu.edu.tw J.-L. Huang Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan, ROC e-mail: jlhuang@cs.nctu.edu.tw synthetic data, instead of only providing the flexibility of striking a compromise between the execution efficiency and the memory consumption, MTK and MTK_Close can both achieve high efficiency and have a constrained memory bound, showing the prominent advantage to be practical algo- rithms of mining frequent patterns. 1 Introduction The discovery of frequent relationship among a huge data- base has been known to be useful in selective marketing, decision analysis, and business management [14]. A popular area of its applications is the market basket analysis, which studies the buying behaviors of customers by searching for sets of items that are frequently purchased together. Specifi- cally, let I ={x 1 , x 2 ,..., x m } be a set of items. A set X I with m =| X | is called a m-itemset or simply an itemset. Formally, an itemset X refers to a frequent itemset or a large itemset if the support of X , i.e., the fraction of transactions in the database that contain X , is larger than the minimum support threshold, indicating that the presence of itemset X is significant in the database. However, it is reported that discovering frequent item- sets suffers from two inherent obstacles, namely, (1) the subtle determination of the minimum support [22]; (2) the unbounded memory consumption [11]. Specifically, without specific knowledge, a critical problem “What is the appro- priate minimum support?” is usually left unsolved to users in previous works. Note that setting the minimum support is quite subtle since a small minimum support may result in an extremely large size of frequent itemsets at the cost of execution efficiency. Oppositely, setting a large minimum support may only generate a few itemsets, which cannot provide enough information for marketing decisions. In 123