On-shelf utility mining with negative item values Guo-Cheng Lan a,⇑ , Tzung-Pei Hong b,c , Jen-Peng Huang d , Vincent S. Tseng a a Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan City 701, Taiwan b Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung City 811, Taiwan c Department of Computer Science and Engineering, National Sun Yat-Sen University, Kaohsiung City 804, Taiwan d Department of Information Management, Southern Taiwan University of Science and Technology, Tainan City 710, Taiwan article info Keywords: Data mining Utility mining On-shelf utility mining High on-shelf utility itemset Negative profit abstract On-shelf utility mining has recently received interest in the data mining field due to its practical consid- erations. On-shelf utility mining considers not only profits and quantities of items in transactions but also their on-shelf time periods in stores. Profit values of items in traditional on-shelf utility mining are con- sidered as being positive. However, in real-world applications, items may be associated with negative profit values. This paper proposes an efficient three-scan mining approach to efficiently find high on-shelf utility itemsets with negative profit values from temporal databases. In particular, an effective itemset generation method is developed to avoid generating a large number of redundant candidates and to effec- tively reduce the number of data scans in mining. Experimental results for several synthetic and real datasets show that the proposed approach has good performance in pruning effectiveness and execution efficiency. Ó 2013 Elsevier Ltd. All rights reserved. 1. Introduction Data mining techniques can extract useful information from databases. Among various techniques in data mining, association-rule mining is important due to its consideration of the co-occurrence relationship of items. That is, association-rule mining techniques can be applied to find items with high frequency in a set of transactions (Agrawal, Imielinksi, & Swami, 1993), and thus have many practical applications, such as analyzing purchasing behaviors in retailing stores, traversal behaviors on websites, and so on. Agrawal et al. proposed the most well-known algorithm for mining association rules from a transaction database, called Apriori (Agrawal & Srikant, 1994). However, when the occurrences of items are considered, it is insufficient to evaluate the signifi- cance of items in a database. The main reason is that a transaction in a transaction database usually also includes the quantities bought of items and item prices. The same significance in associa- tion-rule mining is assumed for all items in a database and thus the actual significance of an itemset cannot be easily recognized. To address the above problem, Chan et al. proposed utility mining (Chan, Yang, & Shen, 2003), which considers both the profits and quantities of products (items) in a set of transactions to evaluate actual utility values of product combinations (itemsets). In their study, itemsets whose actual utility values are larger than or equal to a predefined minimum utility threshold are output as high-utility itemsets. Several studies have modified utility mining for various practical applications, such as improving its perfor- mance and the development of incremental utility mining and stream utility mining. The profits of items in these studies were assumed to be positive values. Temporal data mining has attracted a lot of attention due to its practicality (Ale & Rossi, 2000; Chang, Chen, & Lee, 2002; Lee, Lin, & Chen, 2001; Li, Ning, Wang, & Jajodia, 2003; Ozden, Ramaswamy, & Silberschatz, 1998; Roddick & Spiliopoilou, 2002). For example, consider the product combination {overcoats, stockings}. This combination may not be frequent throughout the entire database, but may have high frequency in winter. Mining time-related knowledge is thus interesting and useful. Of note, since some prod- ucts in a store may be put on the shelf and taken off it repeatedly, some biases may exist in the discovered association rules. Thus it is necessary to consider the on-shelf time periods of products. To address this problem, Lan et al. presented a new issue named on-shelf utility mining (Lan, Hong, & Tseng, 2011) to obtain more accurate utility values of itemsets in temporal databases. In Lan et al.’s study (Lan et al., 2011), on-shelf utility mining considered not only quantities and profits of items in transactions but also the on-shelf time periods of the items. Thus, using on-shelf time periods, the actual utility values of itemsets in a temporal database can be accurately evaluated, and also a two-phase algorithm (named TP-HOU) was designed to find high-on-shelf-utility item- sets in temporal databases. 0957-4174/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2013.10.049 ⇑ Corresponding author. Tel.: +886 920 231609. E-mail addresses: rrfoheiay@gmail.com (G.-C. Lan), tphong@nuk.edu.tw (T.-P. Hong), jehuang@mail.stust.edu.tw (J.-P. Huang), tsengsm@mail.ncku.edu.tw (V.S. Tseng). Expert Systems with Applications 41 (2014) 3450–3459 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa