The Journal of Systems and Software 112 (2016) 110–121 Contents lists available at ScienceDirect The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss Efficient discovery of periodic-frequent patterns in very large databases R. Uday Kiran a,b, , Masaru Kitsuregawa a,c , P. Krishna Reddy d a The University of Tokyo, Japan b National Institute of Information and Communications Technology, Japan c National Institute of Informatics, Japan d International Institute of Information Technology-Hyderabad, India article info Article history: Received 20 February 2015 Revised 12 August 2015 Accepted 26 October 2015 Available online 2 November 2015 Keywords: Data mining Knowledge discovery Frequent patterns abstract Periodic-frequent patterns (or itemsets) are an important class of regularities that exist in a transactional database. Finding these patterns involves discovering all frequent patterns that satisfy the user-specified maximum periodicity constraint. This constraint controls the maximum inter-arrival time of a pattern in a database. The time complexity to measure periodicity of a pattern is O(n), where n represents the number of timestamps at which the corresponding pattern has appeared in a database. As n usually represents a high value in voluminous databases, determining the periodicity of every candidate pattern in the itemset lattice makes the periodic-frequent pattern mining a computationally expensive process. This paper introduces a novel approach to address this problem. Our approach determines the periodic interestingness of a pattern by adopting greedy search. The basic idea of our approach is to discover all periodic-frequent patterns by eliminating aperiodic patterns based on suboptimal solutions. The best and worst case time complexities of our approach to determine the periodic interestingness of a frequent pattern are O(1) and O(n), respec- tively. We introduce two pruning techniques and propose a pattern-growth algorithm to find these patterns efficiently. Experimental results show that our algorithm is runtime efficient and highly scalable as well. © 2015 Elsevier Inc. All rights reserved. 1. Introduction Frequent pattern (or itemset) mining is an important knowledge discovery technique. It typically involves finding all patterns that are occurring frequently in a transactional database. Frequent patterns play a key role in discovering associations (Agrawal et al., 1993), cor- relations (Brin et al., 1997; Omiecinski, 2003), episodes (Mannila, 1997), multi-dimensional patterns (Lent et al., 1997), diverse pat- terns (Srivastava et al., 2011), emerging patterns (Dong and Li, 2009), and so on. The popular adoption and successful industrial applica- tion of frequent patterns has been hindered by a major obstacle: “fre- quent pattern mining often generates too many patterns, and majority of them may be found insignificant depending on application or user re- quirements.” When confronted with this problem in real-world appli- cations, researchers have tried to reduce the desired set by finding user interest-based frequent patterns such as maximal frequent pat- terns (Gouda and Zaki, 2001), demand driven patterns Wang et al. Corresponding author at: The University of Tokyo, Japan. Tel.: +810354526254. E-mail addresses: uday.rage@gmail.com, uday_rage@tkl.iis.u-tokyo.ac.jp (R.U. Ki- ran), kitsure@tkl.iis.u-tokyo.ac.jp (M. Kitsuregawa), pkreddy@iiit.ac.in (P.K. Reddy). URL: http://researchweb.iiit.ac.in/˜uday_rage/index.html (R.U. Kiran), http://www.tkl.iis.u-tokyo.ac.jp/Kilab/Members/memo/kitsure_e.html (M. Kitsure- gawa), http://faculty.iiit.ac.in/˜pkreddy/index.html (P.K. Reddy) (2005), utility patterns (Yao et al., 2004), constraint-based patterns (Pei et al., 2004), diverse-frequent patterns (Swamy et al., 2014), top- k patterns (Han et al., 2002) and periodic-frequent patterns (Tanbeer et al., 2009). This paper focuses on efficient discovery of periodic- frequent patterns. An important criterion to assess the interestingness of a fre- quent pattern is its temporal occurrences within a database. That is, whether a frequent pattern is occurring periodically, irregularly, or mostly at specific time intervals in a database. The class of fre- quent patterns that are occurring periodically within a database are known as periodic-frequent patterns. These patterns are ubiquitous and play a key role in many applications such as finding co-occurring genes in biological datasets (Zhang et al., 2007), improving perfor- mance of recommender systems (Stormer, 2007), intrusion detection in computer networks (Ma and Hellerstein, 2001) and discovering events in Twitter (Kiran et al., 2015). A classic application to illustrate the usefulness of these patterns is market-basket analysis. It analyzes how regularly the set of items are being purchased by the customers. An example of a periodic-frequent pattern is as follows: {Bed, Pillow} [support = 10%, periodicity = 1 hour]. The above pattern says that 10% of customers have purchased the items ‘Bed’ and ‘Pillow’ at least once in every hour. The basic model of periodic-frequent patterns is as follows (Tanbeer et al., 2009): http://dx.doi.org/10.1016/j.jss.2015.10.035 0164-1212/© 2015 Elsevier Inc. All rights reserved.