SMC 2009
Mining High Average-Utility Itemsets
Tzung-Pei Hong
Dept. of Computer Science and
Information Engineering
National University of Kaohsiung
Kaohsiung, Taiwan
tphong@nuk.edu.tw
Cho-Han Lee
Institute of Electrical Engineering
National University of Kaohsiung
Kaohsiung, Taiwan
prescott2005@hotmail.com
Shyue-Liang Wang
Dept. of Information Management
National University of Kaohsiung
Kaohsiung, Taiwan
slwang@nuk.edu.tw
Abstract—The average utility measure is adopted in this
paper to reveal a better utility effect of combining several items
than the original utility measure. A mining algorithm is then
proposed to efficiently find the high average-utility itemsets. It
uses the summation of the maximal utility among the items in
each transaction including the target itemset as the upper bounds
to overestimate the actual average utilities of the itemset and
processes it in two phases. As expected, the mined high average-
utility itemsets in the proposed way will be fewer than the high
utility itemset under the same threshold. Experiments results also
show the performance of the proposed algorithm.
Keywords—utility mining, average utility, two-phase mining,
downward closure
I. INTRODUCTION
In the past, Liu et al. then presented a two-phase algorithm
for fast discovering all high utility itemsets [2, 3]. In this paper,
we proposed a new idea to evaluate the utilities of itemsets.
Traditionally, the utility of an itemset is the summation of the
utilities of the itemset in all the transactions regardless of its
length. Thus, the utility of an itemset in a transaction will
increase along with the increase of its length. That is, longer
itemsets in a transaction result in higher utility values. Thus,
using the same minimum threshold to judge itemsets with
different lengths is not fair. In order to alleviate the effect of the
length of itemsets and identify really good utility itemsets, the
average utility measure is adopted in this paper to reveal a
better utility effect of combining several items than the original
utility measure. It is defined as the total utility of an itemset
divided by its number of items within it. The average utility of
an itemset is then compared with a threshold to decide whether
it is a high average-utility itemset. An algorithm is also
proposed to find all the high average-utility itemsets.
Like two-phase mining for high utility itemsets, the
proposed mining algorithm for high average-utility itemsets
uses average-utility upper bounds to overestimate the actual
average utilities of itemsets for satisfying the downward
closure property. The average-utility upper bound of an itemset
is designed here as the summation of the maximal utility
among the items in each transaction including the itemset. Only
the combinations of the itemsets which have their average-
utility upper bounds beyond the user-defined threshold are
added into the candidate set in a level-wise way. The
downward closure property can thus be maintained in this way.
Finally, the performance of the proposed mining algorithm is
verified by real-world market data.
II. REVIEW OF RELATED MINING ALGORITHMS
Agrawal and Srikant proposed the Apriori algorithm [1] to
mine association rules from a set of transactions. In each pass,
Apriori employs the downward-closure (anti-monotone)
property to prune impossible candidates, thus improving the
efficiency of identifying frequent itemsets. Many other
algorithms based on the property have then been proposed to
discover frequent itemsets rapidly [4-7].
Traditional association-rule mining does not, however,
consider the quantities sold in transactions and the profit of
each item sold, which are important to some applications as
well. Yao et al. thus proposed the utility model to measure how
“useful” an itemset is by considering both the quantities and the
profits of items [8]. In utility mining, the downward-closure
property no long exists since the utility of an itemset will grow
monotonically and the frequency of an itemset will reduce
monotonically along with the number of items in an itemset.
The two different monotonic properties make the downward-
closure property invalid in utility mining. Thus, Barber and
Hamilton proposed the approaches of Zero pruning (ZP) and
Zero subset pruning (ZSP) to exhaustively search for all high
utility itemsets in the database [9]. Li et al. then proposed the
FSM, the ShFSM and the DCG methods [10, 11] to discover all
high utility itemsets by taking advantage of the level-closure
property. Besides, Yao proposed a framework for mining high
utility itemsets based on mathematical properties of utility
constraints [12]. Liu et al. then presented a two-phase
algorithm for fast discovering all high utility itemsets [2, 3].
The proposed approach is based on the two-phased approach.
III. MINING HIGH AVERAGE-UTILITY ITEMSETS
Traditionally, the utility of an itemset is the summation of
the utilities of the itemset in all the transactions regardless of its
length. Thus, the utility of an itemset in a transaction will
increase along with the increase of its length. That is, longer
itemsets in a transaction result in higher utility values. For
example, assume a transaction is given as shown in Table 1.
There are five items in the transaction, respectively denoted A
to E. The value attached to each item is the quantity sold in the
transaction.
TABLE 1. A TRANSACTION AS THE EXAMPLE.
TID A B C D E
tx 1 1 4 1 0
Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics
San Antonio, TX, USA - October 2009
978-1-4244-2794-9/09/$25.00 ©2009 IEEE
2600