K.G. Mehrotra et al. (Eds.): IEA/AIE 2011, Part I, LNAI 6703, pp. 95–104, 2011. © Springer-Verlag Berlin Heidelberg 2011 A Generic Approach for Mining Indirect Association Rules in Data Streams Wen-Yang Lin 1 , You-En Wei 1 , and Chun-Hao Chen 2 1 Dept. of Computer Science and Information Engineering, National University of Kaohsiung, Taiwan wylin@nuk.edu.tw, waiewing@gmail.com 2 Dept. of Computer Science and Information Engineering, Tamkang University, Taiwan chchen@mail.tku.edu.tw Abstract. An indirect association refers to an infrequent itempair, each item of which is highly co-occurring with a frequent itemset called “mediator”. Al- though indirect associations have been recognized as powerful patterns in re- vealing interesting information hidden in many applications, such as recom- mendation ranking, substitute items or competitive items, and common web navigation path, etc., almost no work, to our knowledge, has investigated how to discover this type of patterns from streaming data. In this paper, the problem of mining indirect associations from data streams is considered. Unlike contem- porary research work on stream data mining that investigates the problem indi- vidually from different types of streaming models, we treat the problem in a ge- neric way. We propose a generic window model that can represent all classical streaming models and retain user flexibility in defining new models. In this con- text, a generic algorithm is developed, which guarantees no false positive rules and bounded support error as long as the window model is specifiable by the proposed generic model. Comprehensive experiments on both synthetic and real datasets have showed the effectiveness of the proposed approach as a generic way for finding indirect association rules over streaming data. 1 Introduction Recently, the problem of mining interesting patterns or knowledge from large volumes of continuous, fast growing datasets over time, so-called data streams, has emerged as one of the most challenging issues to the data mining research community [1, 3]. Al- though over the past few years there is a large volume of literature on mining frequent patterns, such as itemsets, maximal itemsets, closed itemsets, etc., no work, to our knowledge, has endeavored to discover indirect associations, a recently coined new type of infrequent patterns. The term indirection association, first proposed by Tan et al. in 2000 [21], refers to an infrequent itempair, each item of which is highly co- occurring with a frequent itemset called “mediator”. Indirect associations have been recognized as powerful patterns in revealing interesting information hidden in many applications, such as recommendation ranking [14], common web navigation path [20], and substitute items (or competitive items) [23], etc. For example, Coca-cola and Pepsi are competitive products and could be replaced by each other. So it is very likely that