K.G. Mehrotra et al. (Eds.): IEA/AIE 2011, Part I, LNAI 6703, pp. 95–104, 2011.
© Springer-Verlag Berlin Heidelberg 2011
A Generic Approach for Mining Indirect Association
Rules in Data Streams
Wen-Yang Lin
1
, You-En Wei
1
, and Chun-Hao Chen
2
1
Dept. of Computer Science and Information Engineering,
National University of Kaohsiung, Taiwan
wylin@nuk.edu.tw, waiewing@gmail.com
2
Dept. of Computer Science and Information Engineering, Tamkang University, Taiwan
chchen@mail.tku.edu.tw
Abstract. An indirect association refers to an infrequent itempair, each item of
which is highly co-occurring with a frequent itemset called “mediator”. Al-
though indirect associations have been recognized as powerful patterns in re-
vealing interesting information hidden in many applications, such as recom-
mendation ranking, substitute items or competitive items, and common web
navigation path, etc., almost no work, to our knowledge, has investigated how
to discover this type of patterns from streaming data. In this paper, the problem
of mining indirect associations from data streams is considered. Unlike contem-
porary research work on stream data mining that investigates the problem indi-
vidually from different types of streaming models, we treat the problem in a ge-
neric way. We propose a generic window model that can represent all classical
streaming models and retain user flexibility in defining new models. In this con-
text, a generic algorithm is developed, which guarantees no false positive rules
and bounded support error as long as the window model is specifiable by the
proposed generic model. Comprehensive experiments on both synthetic and real
datasets have showed the effectiveness of the proposed approach as a generic
way for finding indirect association rules over streaming data.
1 Introduction
Recently, the problem of mining interesting patterns or knowledge from large volumes
of continuous, fast growing datasets over time, so-called data streams, has emerged as
one of the most challenging issues to the data mining research community [1, 3]. Al-
though over the past few years there is a large volume of literature on mining frequent
patterns, such as itemsets, maximal itemsets, closed itemsets, etc., no work, to our
knowledge, has endeavored to discover indirect associations, a recently coined new
type of infrequent patterns. The term indirection association, first proposed by Tan et
al. in 2000 [21], refers to an infrequent itempair, each item of which is highly co-
occurring with a frequent itemset called “mediator”. Indirect associations have been
recognized as powerful patterns in revealing interesting information hidden in many
applications, such as recommendation ranking [14], common web navigation path [20],
and substitute items (or competitive items) [23], etc. For example, Coca-cola and Pepsi
are competitive products and could be replaced by each other. So it is very likely that