J. ICT Res. Appl. Vol. 10, No. 2, 2016, 153-176 153
Received October 30
th
, 2015, Revised May 9
th
, 2016, Accepted for publication May 31
st
, 2016.
Copyright © 2016 Published by ITB Journal Publisher, ISSN: 2337-5787, DOI: 10.5614/itbj.ict.res.appl.2016.10.2.5
Mining High Utility Itemsets with Regular Occurrence
Komate Amphawan
1,*
, Philippe Lenca
2
, Anuchit Jitpattanakul
3
&
Athasit Surarerks
4
1
Burapha University, Computational Innovation Laboratory, 20131 Chonburi, Thailand
2
Institut Telecom, Telecom Bretagne, UMR CNRS 3192 Lab-STICC, France
3
Faculty of Applied Science, KNUTNB, 10800 Bangkok, Thailand
4
Chulalongkorn University, ELITE Laboratory, 10330 Bangkok, Thailand
*E-mail: komate@gmail.com
Abstract. High utility itemset mining (HUIM) plays an important role in the data
mining community and in a wide range of applications. For example, in retail
business it is used for finding sets of sold products that give high profit, low cost,
etc. These itemsets can help improve marketing strategies, make promotions/
advertisements, etc. However, since HUIM only considers utility values of
items/itemsets, it may not be sufficient to observe product-buying behavior of
customers such as information related to “regular purchases of sets of products
having a high profit margin”. To address this issue, the occurrence behavior of
itemsets (in the term of regularity) simultaneously with their utility values was
investigated. Then, the problem of mining high utility itemsets with regular
occurrence (MHUIR) to find sets of co-occurrence items with high utility values
and regular occurrence in a database was considered. An efficient single-pass
algorithm, called MHUIRA, was introduced. A new modified utility-list
structure, called NUL, was designed to efficiently maintain utility values and
occurrence information and to increase the efficiency of computing the utility of
itemsets. Experimental studies on real and synthetic datasets and complexity
analyses are provided to show the efficiency of MHUIRA combined with NUL
in terms of time and space usage for mining interesting itemsets based on
regularity and utility constraints.
Keywords: association rule mining; data mining; high utility itemsets; occurrence
behavior; regularity constraint; utility-list structure.
1 Introduction
Association rule mining (ARM) [1,2] is a fundamental task of data mining and
data analysis. It aims to discover a relationship between objects or events, which
is expressed in the form of a → rule. For example, from purchasing data of
a retail business, ARM may discover the rule “ܤܦ→ݎ ݎሾ :ݏ30%, : 60% ሿ”
which expresses buying behavior of customers, i.e. 30% of customers bought
ܤ ݎsimultaneously with ܦ ݎand 60% of customers who bought ܤ ݎ
also bought ܦ ݎat the same time. ARM can be applied in several areas such
as retail marketing, web clickstream analysis and DNA analysis. ARM consists