A Framework of Numerical Basket Analysis Takashi Washio, Atsushi Fujimoto and Hiroshi Motoda The Institute of Scientific and Industrial Research Osaka University 8-1, Mihogaoka, Ibarakishi, Osaka, 567-0047, Japan washio@ar.sanken.osaka-u.ac.jp Abstract Basket Analysis is mathematically characterized and ex- tended to search families of sets in this paper. These the- ories indicate the possibility of various new approaches of data mining. We demonstrate the potential through proposal of a novel approach QARMINT. It performs complete min- ing of generic QARs within a low time complexity which has not been well addressed in the past work. Its performance evaluation shows high practicality. 1. Introduction Since an algorithm of Basket Analysis was proposed by Agrawal and Srikant [1], a large number of researches on more efficient Basket Analysis have been presented in the field of data mining. A basic principle underlying all of the algorithms is the bottom up building of candidate itemsets in a lattice under a downward closure property of itemsets, i.e., “if any given itemset is not large, any superset of will also not be large.” The most representative measure to introduce the downward closure property of the itemsets is “support,” i.e., occurrence frequency of an itemset in given transaction data. If an itemset occurs more than a thresh- old value, i.e.,“minimum support,” it is called a “frequent itemset.” When two itemsets and sharing their elements are frequent, their join is a candidate fre- quent itemset. Some issues remain in the current Basket Analysis where transactions and itemsets are limited to finite Boolean sets. The aforementioned basic principle has wider applicability not limited to the search on the finite Boolean lattice, be- cause it requires only a search space having (1) a join opera- tion between two sets and (2) a downward closure property among sets. In spite of this wide applicability, the frame- work of the Basket Analysis has not been extended to ad- dress more generic tasks. Another issue is the analysis of transaction data in- cluding items with numeric values such as “ ” and “ .” These items are called “numeric items” whereas the items having categorical values such as “ ” are called “categorical items.” The clause of an item such as “ ” is called an “attribute.” Some categorical item may be only a clause as “ ” without its value. An association rule in which every nu- meric item has appropriate intervals of its value is called a “quantitative association rule” (QAR). An example QAR is “ and ” which states “a person who is thirties and married owns two cars.” Since Srikant and Agrawal proposed an ap- proach to mine QARs [2], number of studies on the QAR mining have been made. However, the problem to mine a complete set of QARs in generic form under representative mining measures is known to be NP-complete [4]. The state of the art has not addressed the complete mining of generic QARs within tractable time complexity. In this paper, first, we extend the framework of the Bas- ket Analysis to searching families of sets based on the math- ematical characterization of the aforementioned basic prin- ciple. Second, we propose a novel approach and its im- plementation for complete mining of generic QARs within a low time complexity based on the exten- sion where is the number of transactions in data. This approach is called QAR mining by Monotonic INTerval (QARMINT) by the nature of its mining criterion. Its low time complexity in terms of the data amount is essential for mining large data. Third, its performance evaluation is pre- sented to show practicality. 2. Extension of Basket Analysis As mentioned earlier, the basic principle of the Basket Analysis requires only a search space having (1) a join op- eration between two sets and (2) a downward closure prop- erty among sets. The operation (1) introduces a structure on the search space. Let be a “family of sets” in which ele-