Searching Frequent Pattern and Prefix Trees for Higher Order Rules Ping Liang, John F. Roddick and Denise de Vries School of Computer Science, Engineering and Mathematics Flinders University, PO Box 2100, Adelaide, South Australia 5001 {ping.liang, john.roddick, denise.devries}@flinders.edu.au Abstract Since the search for rules that can inform business decision making is the ultimate goal of data mining technology, problems such as the interpretation of in- terestingness for discovered rules is an important is- sue. However, the search for rules that adhere to a user’s definition of interesting remains somewhat elu- sive, in part because rules are commonly supplied in a low, instance-level format. In this paper we argue that rules with more useable semantics can be obtained by searching for patterns in the intermediate data structures such as frequent pattern or prefix trees. This paper discusses this ap- proach and present a proof-of-concept system, Horace, that shows that the approach is both useable and ef- ficient. 1 Introduction Since the early work of Agrawal, Srikant and others (Agrawal et al. 1993, Agrawal & Srikant 1994, Srikant & Agrawal 1995) association mining research has be- come a mature field (Ceglar & Roddick 2006) and has been applied in a variety of industry sectors includ- ing commerce, defence, health, manufacturing, explo- ration and engineering. One form of data mining al- gorithm, association mining algorithms, have the ca- pacity to rapidly discover sets of co-occurring items or events in very large databases and the time com- plexity of most algorithms is generally close to linear in the size of the dataset (Zaki & Ogihara 1998). A variety of extensions have been proposed that enable, for example, temporal (Ale & Rossi 2000, Li et al. 2003, Rains- ford & Roddick 1999) and spatial (Han et al. 1997, Koperski & Han 1995) semantics to be ac- commodated, closed sets to be identified (Pasquier et al. 1999, Zaki 2000), fuzzy and incomplete data to be handled (Chan & Au 1997, Kuok et al. 1998), the accommodation of domain-specific concept hierarchies (Cheung, Ng & Tam 1996, Fortin & Liu 1996, Han & Fu 1995, Shen & Shen 1998), and Copyright c 2013, Australian Computer Society, Inc. This pa- per appeared at the Eleventh Australasian Data Mining Con- ference (AusDM 2013), Canberra, 13-15 November 2013. Con- ferences in Research and Practice in Information Technology (CRPIT), Vol. 146, Peter Christen, Paul Kennedy, Lin Liu, Kok-Leong Ong, Andrew Stranieri and Yanchang Zhao, Ed. Reproduction for academic, not-for-profit purposes permitted provided this text is included. the application of visualisation techniques (Ong et al. 2002). Clearly, in order to create useable systems, problems such as the interpretation of interestingness for dis- covered rules are an important issue and need to be resolved. Unfortunately, the search for rules that adhere to a user’s definition of interesting (and in- deed, even the user’s definition of interesting) remains some-what elusive (Geng & Hamilton 2006), in part because rules are generally supplied in an instance- level format, such as DigitalT V DV DP layer Cables σ(20%)γ(65%) (1) where the σ (support) and γ (confidence) values are examples of some quality metric for the rule. Such low-level rules, while useful, provide knowl- edge only about the coincidence of elementary val- ues and can be termed zero-order rules. Higher or- der semantics can be derived when sets of rules are inspected to determine patterns of interest between rules. For example, two competitor items a and b may be discovered by observing a set of rules such that: {a}→{c} σ(x) (2) {b}→{c} σ(y) (3) {a, b}→{c} σ(z) (4) σ(z) (x) × σ(y) (5) That is, the observed value for Eq.(5) is consider- ably lower than one would have expected with inde- pendent items. Other patterns of rules can also be detected in this way including catalysts, and others. In the past specific algorithms have been devel- oped to search for each case. For example, Teng (2002) outlines a mechanism for learning dissociations (aka competitors) from source data. Since frequent pattern and prefix trees are (gener- ally speaking) isomorphic with the resulting ruleset, our approach here is to search such data structures di- rectly for patterns. The (higher order) semantics are expressed as an FP-tree pattern and our algorithm is thus able to find a variety of higher order rules in one pass of the FP-tree or prefix tree. The rest of the paper is as follows. Section 2 dis- cusses other work in higher order mining and section 3 provides a description of some basic notation and con- cepts used in this paper. Section 4 defines patterns in ruleset while Section 5 outlines our approach in broad detail. Section 6 discusses a particular variant in which FP-trees are searched. Section 7 discusses Horace, our proof-of-concept system and Section 8 provides a discussion of future work.