The Journal of Systems and Software 112 (2016) 110–121
Contents lists available at ScienceDirect
The Journal of Systems and Software
journal homepage: www.elsevier.com/locate/jss
Efficient discovery of periodic-frequent patterns in very large databases
R. Uday Kiran
a,b,∗
, Masaru Kitsuregawa
a,c
, P. Krishna Reddy
d
a
The University of Tokyo, Japan
b
National Institute of Information and Communications Technology, Japan
c
National Institute of Informatics, Japan
d
International Institute of Information Technology-Hyderabad, India
article info
Article history:
Received 20 February 2015
Revised 12 August 2015
Accepted 26 October 2015
Available online 2 November 2015
Keywords:
Data mining
Knowledge discovery
Frequent patterns
abstract
Periodic-frequent patterns (or itemsets) are an important class of regularities that exist in a transactional
database. Finding these patterns involves discovering all frequent patterns that satisfy the user-specified
maximum periodicity constraint. This constraint controls the maximum inter-arrival time of a pattern in
a database. The time complexity to measure periodicity of a pattern is O(n), where n represents the number
of timestamps at which the corresponding pattern has appeared in a database. As n usually represents a high
value in voluminous databases, determining the periodicity of every candidate pattern in the itemset lattice
makes the periodic-frequent pattern mining a computationally expensive process. This paper introduces a
novel approach to address this problem. Our approach determines the periodic interestingness of a pattern
by adopting greedy search. The basic idea of our approach is to discover all periodic-frequent patterns by
eliminating aperiodic patterns based on suboptimal solutions. The best and worst case time complexities
of our approach to determine the periodic interestingness of a frequent pattern are O(1) and O(n), respec-
tively. We introduce two pruning techniques and propose a pattern-growth algorithm to find these patterns
efficiently. Experimental results show that our algorithm is runtime efficient and highly scalable as well.
© 2015 Elsevier Inc. All rights reserved.
1. Introduction
Frequent pattern (or itemset) mining is an important knowledge
discovery technique. It typically involves finding all patterns that are
occurring frequently in a transactional database. Frequent patterns
play a key role in discovering associations (Agrawal et al., 1993), cor-
relations (Brin et al., 1997; Omiecinski, 2003), episodes (Mannila,
1997), multi-dimensional patterns (Lent et al., 1997), diverse pat-
terns (Srivastava et al., 2011), emerging patterns (Dong and Li, 2009),
and so on. The popular adoption and successful industrial applica-
tion of frequent patterns has been hindered by a major obstacle: “fre-
quent pattern mining often generates too many patterns, and majority
of them may be found insignificant depending on application or user re-
quirements.” When confronted with this problem in real-world appli-
cations, researchers have tried to reduce the desired set by finding
user interest-based frequent patterns such as maximal frequent pat-
terns (Gouda and Zaki, 2001), demand driven patterns Wang et al.
∗
Corresponding author at: The University of Tokyo, Japan. Tel.: +810354526254.
E-mail addresses: uday.rage@gmail.com, uday_rage@tkl.iis.u-tokyo.ac.jp (R.U. Ki-
ran), kitsure@tkl.iis.u-tokyo.ac.jp (M. Kitsuregawa), pkreddy@iiit.ac.in (P.K. Reddy).
URL: http://researchweb.iiit.ac.in/˜uday_rage/index.html (R.U. Kiran),
http://www.tkl.iis.u-tokyo.ac.jp/Kilab/Members/memo/kitsure_e.html (M. Kitsure-
gawa), http://faculty.iiit.ac.in/˜pkreddy/index.html (P.K. Reddy)
(2005), utility patterns (Yao et al., 2004), constraint-based patterns
(Pei et al., 2004), diverse-frequent patterns (Swamy et al., 2014), top-
k patterns (Han et al., 2002) and periodic-frequent patterns (Tanbeer
et al., 2009). This paper focuses on efficient discovery of periodic-
frequent patterns.
An important criterion to assess the interestingness of a fre-
quent pattern is its temporal occurrences within a database. That
is, whether a frequent pattern is occurring periodically, irregularly,
or mostly at specific time intervals in a database. The class of fre-
quent patterns that are occurring periodically within a database are
known as periodic-frequent patterns. These patterns are ubiquitous
and play a key role in many applications such as finding co-occurring
genes in biological datasets (Zhang et al., 2007), improving perfor-
mance of recommender systems (Stormer, 2007), intrusion detection
in computer networks (Ma and Hellerstein, 2001) and discovering
events in Twitter (Kiran et al., 2015). A classic application to illustrate
the usefulness of these patterns is market-basket analysis. It analyzes
how regularly the set of items are being purchased by the customers.
An example of a periodic-frequent pattern is as follows:
{Bed, Pillow} [support = 10%, periodicity = 1 hour].
The above pattern says that 10% of customers have purchased the
items ‘Bed’ and ‘Pillow’ at least once in every hour. The basic model
of periodic-frequent patterns is as follows (Tanbeer et al., 2009):
http://dx.doi.org/10.1016/j.jss.2015.10.035
0164-1212/© 2015 Elsevier Inc. All rights reserved.