Full length article A two-phase approach to mine short-period high-utility itemsets in transactional databases Jerry Chun-Wei Lin a, , Jiexiong Zhang a , Philippe Fournier-Viger b , Tzung-Pei Hong c,d , Ji Zhang e a School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China b School of Natural Sciences and Humanities, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China c Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan d Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan e School of Agricultural, Computational and Environmental Sciences, University of Southern Queensland, Australia article info Article history: Received 3 May 2016 Received in revised form 9 February 2017 Accepted 29 April 2017 Keywords: High-utility itemsets Periodic high-utility itemsets SPHUIs Two-phase Data mining abstract The discovery of high-utility itemsets (HUIs) in transactional databases has attracted much interest from researchers in recent years since it can uncover hidden information that is useful for decision making, and it is widely used in many domains. Nonetheless, traditional methods for high-utility itemset mining (HUIM) utilize the utility measure as sole criterion to determine which item/sets should be presented to the user. These methods ignore the timestamps of transactions and do not consider the period con- straint. Hence, these algorithms often finds HUIs that are profitable but that seldom occur in transactions. In this paper, we address this limitation of previous methods by pushing the period constraint in the HUI mining process. A new framework called short-period high-utility itemset mining (SPHUIM) is designed to identify patterns in a transactional database that appear regularly, are profitable, and also yield a high utility under the period constraint. The aim of discovering short-period high-utility itemsets (SPHUI) is hence to identify patterns that are interesting both in terms of period and utility. The paper proposes a baseline two-phase short-period high-utility itemset (SPHUI TP ) mining algorithm to mine SPHUIs in a level-wise manner. Then, to reduce the search space of the SPHUI TP algorithm and speed up the discovery of SPHUIs, two pruning strategies are developed and integrated in the baseline algorithm. The resulting algorithms are denoted as SPHUI MT and SPHUI TID , respectively. Substantial experiments both on real-life and synthetic datasets show that the three proposed algorithms can efficiently and effectively discover the complete set of SPHUIs, and that considering the short-period constraint and the utility measure can greatly reduce the number of patterns found. Ó 2017 Elsevier Ltd. All rights reserved. 1. Introduction Association rule mining (ARM) [1–3] plays an important role in data mining. The main objective of ARM is to discover interesting associations or patterns in transaction databases. It is performed by first extracting sets of items that appear frequently in databases according to a minimum support threshold, called the frequent itemsets (FIs). Then, the FIs are used to derive association rules (ARs) respecting a minimum confidence threshold [1]. Albeit tradi- tional frequent itemset mining (FIM) and ARM techniques are use- ful, they also have some important limitations. Some of the main limitations are that items are not allowed to occur more than once in each transaction, and that all items are considered as equally important. But in real-life situations, these assumptions do not often hold [4]. Moreover, FIM and ARM are mainly used to discover patterns that appear frequently in databases, and do not take other criteria into account for discovering patterns such as the impor- tance, unit profits, or weights of items. Although FIM and ARM have been designed to analyze cus- tomer transactions for market basket analysis at first, these tasks are defined quite generally and hence have been applied in many other fields related to science and engineering [1,5,6]. For example, an important application of ARM is clickstream analysis [7], that is the analysis of the behavior of persons visiting a website or using a software. A clickstream is defined as a set of records indicating the webpages or user interface elements that each user has visited or clicked. Analyzing clickstreams allows researchers to discover interesting and useful information about the behavior or http://dx.doi.org/10.1016/j.aei.2017.04.007 1474-0346/Ó 2017 Elsevier Ltd. All rights reserved. Corresponding author. E-mail addresses: jerrylin@ieee.org (J.C.-W. Lin), jiexiongzhang@ikelab.net (J. Zhang), philfv@hitsz.edu.cn (P. Fournier-Viger), tphong@nuk.edu.tw (T.-P. Hong), ji.zhang@usq.edu.au (J. Zhang). Advanced Engineering Informatics 33 (2017) 29–43 Contents lists available at ScienceDirect Advanced Engineering Informatics journal homepage: www.elsevier.com/locate/aei