Mining Productive Emerging Patterns and Their Application in Trend Prediction Vincent Mwintieru Nofong School of Information Technology and Mathematical Science, University of South Australia, GPO Box 2471, Adelaide, SA 5001 Email: vincent.nofong@mymail.unisa.edu.au Abstract Emerging pattern mining is an important data mining task for various decision making. However, it often presents a large number of emerging patterns most of which are not useful as their emergence are due to random occurrence of items. Such emerging patterns would most often be detrimental in decision mak- ing where inherent relationships between the items of emerging patterns are relevant. Additionally, most studies on emerging pattern mining focus on mining interesting categories of emerging patterns for classi- fication and seldom discuss their application in trend prediction. To enable mine the set of emerging pat- terns with inherent item relations for decision mak- ing such as trend prediction, we employ a correlation test on the items of emerging patterns and introduce the productive emerging patterns as the set of emerg- ing patterns with inherent item relations. We subse- quently propose and develop PEPs, an efficient frame- work for mining our proposed productive emerging patterns. We further discuss and show the possible application of emerging patterns in trend prediction. Our experimental results shows PEPs is efficient, and the productive emerging pattern set which is smaller than the set of all emerging patterns, shows potentials in trend prediction. Keywords: Frequent Patterns, Emerging Patterns, Productiveness Measure, Trend Prediction. 1 Introduction Emerging Patterns (EPs), the set of patterns whose frequencies increase from one dataset to another, are vital in various decision making. In static datasets such as those with classes (male vs. female, cured vs. not cured), emerging patterns can reveal useful and hidden contrast patterns between datasets to support decision making such as classifier construction (Dong and Li 1999, Li et al. 2001), disease likelihood pre- diction (Li et al. 2003), discovering patterns in gene expression data (Li and Wong 2001), and so on. In sequential datasets, emerging patterns are useful in decision making such as, studying and understanding customers’ behaviour (Tsai and Shieh 2009), predict- ing future purchases (Nofong et al. 2014) and so on. Though emerging pattern mining is an important data mining task, it is a difficult task as the down- Copyright c 2015, Australian Computer Society, Inc. This paper appeared at the Thirteenth Australasian Data Mining Conference, Sydney, Australia. Conferences in Research and Practice in Information Technology, Vol. 168. Md Zahidul Is- lam, Ling Chen, Kok-Leong Ong, Yanchang Zhao, Richi Nayak, Paul Kennedy, Ed. Reproduction for academic, not-for profit purposes permitted provided this text is included. ward closure property in frequent pattern mining is not applicable in emerging pattern mining (Cheng et al. 2010, Dong and Li 1999, Poezevara et al. 2011). Over the past years however, various studies have been proposed for efficient mining of emerging pat- terns (Dong and Li 2005, Li et al. 2003, Li and Wong 2001) and interesting emerging patterns (Fan and Ra- mamohanarao 2003, 2006, 2002, Li et al. 2001, Ter- lecki and Walczak 2007, Soulet et al. 2004). Though these works have been useful in mining emerging pat- terns for various decision making, they are faced with the following challenges: • They often present a too large or a too small number of emerging patterns for decision mak- ing. Reporting a large number of emerging pat- terns makes it difficult to identify the set of useful ones as some might be: i.) redundant, or ii.) emerging due to random occurrence of items. Such redundant emerging patterns, or those due to random occurrence of items, would most often be detrimental in decision making where non-redundancy or inherent relationships between items of an emerging pattern are vital. On the other hand, reporting a small number of emerging patterns may result in missing some useful emerging patterns that are needed in de- cision making. • They often focus on mining interesting sets of emerging patterns for classification and sel- dom discuss their application in trend predic- tion. Though emerging patterns can reveal use- ful emerging trends in time-stamped datasets for trend prediction, this useful application of emerg- ing patterns is unexplored as it is hardly men- tioned in existing works on emerging pattern mining. • Though some categories of emerging patterns such as jumping emerging patterns (Fan and Ra- mamohanarao 2006, Terlecki and Walczak 2007) and essential emerging patterns (Fan and Ra- mamohanarao 2002) are very useful in classifier formation, they will not be ideal in trend predic- tion. This is because, per their definitions, such emerging patterns in time-stamped datasets will more likely be spikes or noise, and not emerging trends. • Though the emerging patterns reported in (Fan and Ramamohanarao 2003) can be applica- ble in trend prediction, some useful emerging patterns needed in decision making might be missed. For instance, on a Twitter dataset, (Fan and Ramamohanarao 2003) misses some in- teresting and useful emerging hashtags such as, Proceedings of the 13-th Australasian Data Mining Conference (AusDM 2015), Sydney, Australia 109