The VLDB Journal (2008) 17:1321–1344
DOI 10.1007/s00778-007-0078-6
REGULAR PAPER
Mining top-k frequent patterns in the presence of the memory constraint
Kun-Ta Chuang · Jiun-Long Huang ·
Ming-Syan Chen
Received: 16 January 2006 / Revised: 11 March 2007 / Accepted: 8 August 2007 / Published online: 7 November 2007
© Springer-Verlag 2007
Abstract We explore in this paper a practicably interest-
ing mining task to retrieve top-k (closed ) itemsets in the
presence of the memory constraint. Specifically, as opposed
to most previous works that concentrate on improving the
mining efficiency or on reducing the memory size by best
effort, we first attempt to specify the available upper mem-
ory size that can be utilized by mining frequent itemsets.
To comply with the upper bound of the memory consump-
tion, two efficient algorithms, called MTK and MTK_Close,
are devised for mining frequent itemsets and closed item-
sets, respectively, without specifying the subtle minimum
support. Instead, users only need to give a more human-
understandable parameter, namely the desired number of
frequent (closed ) itemsets k . In practice, it is quite chal-
lenging to constrain the memory consumption while also
efficiently retrieving top-k itemsets. To effectively achieve
this, MTK and MTK_Close are devised as level-wise search
algorithms, where the number of candidates being generated-
and-tested in each database scan will be limited. A novel
search approach, called δ-stair search, is utilized in MTK
and MTK_Close to effectively assign the available memory
for testing candidate itemsets with various itemset-lengths,
which leads to a small number of required database scans.
As demonstrated in the empirical study on real data and
K.-T. Chuang (B ) · M.-S. Chen
Department of Electrical Engineering,
National Taiwan University, Taipei, Taiwan, ROC
e-mail: doug@arbor.ee.ntu.edu.tw
M.-S. Chen
e-mail: mschen@cc.ee.ntu.edu.tw
J.-L. Huang
Department of Computer Science,
National Chiao Tung University, Hsinchu, Taiwan, ROC
e-mail: jlhuang@cs.nctu.edu.tw
synthetic data, instead of only providing the flexibility of
striking a compromise between the execution efficiency and
the memory consumption, MTK and MTK_Close can both
achieve high efficiency and have a constrained memory
bound, showing the prominent advantage to be practical algo-
rithms of mining frequent patterns.
1 Introduction
The discovery of frequent relationship among a huge data-
base has been known to be useful in selective marketing,
decision analysis, and business management [14]. A popular
area of its applications is the market basket analysis, which
studies the buying behaviors of customers by searching for
sets of items that are frequently purchased together. Specifi-
cally, let I ={x
1
, x
2
,..., x
m
} be a set of items. A set X ⊆ I
with m =| X | is called a m-itemset or simply an itemset.
Formally, an itemset X refers to a frequent itemset or a large
itemset if the support of X , i.e., the fraction of transactions
in the database that contain X , is larger than the minimum
support threshold, indicating that the presence of itemset X
is significant in the database.
However, it is reported that discovering frequent item-
sets suffers from two inherent obstacles, namely, (1) the
subtle determination of the minimum support [22]; (2) the
unbounded memory consumption [11]. Specifically, without
specific knowledge, a critical problem “What is the appro-
priate minimum support?” is usually left unsolved to users
in previous works. Note that setting the minimum support
is quite subtle since a small minimum support may result in
an extremely large size of frequent itemsets at the cost of
execution efficiency. Oppositely, setting a large minimum
support may only generate a few itemsets, which cannot
provide enough information for marketing decisions. In
123