A Dynamic Bit-vector Approach for Efficiently Mining
Inter-sequence Patterns
Bay Vo, Minh-Thai Tran
Information Technology College
Ho Chi Minh City, Viet Nam
{vdbay,minhthai}@itc.edu.vn
Hieu Nguyen
Yellow Pepper Viet Nam
Ho Chi Minh City, Viet Nam
minhhieu052@gmail.com
Tzung-Pei Hong
Department of CSIE
National University of Kaohsiung
Kaohsiung City, Taiwan, R.O.C
tphong@nuk.edu.tw
Bac Le
Department of Computer Science
University of Science
Ho Chi Minh, Viet Nam
lhbac@fit.hcmus.edu.vn
Abstract—The inter-sequence pattern (ISP) mining method
can be used to mine sequential patterns inside a
transaction and inter-transaction patterns in several
transactions. Consequently, the ISP mining method is
more general than two traditional sequence mining
methods. This paper proposes an algorithm that uses a
dynamic bit-vector (DBV) data structure to efficiently
mine ISPs. The DBV-ISP algorithm uses the divide-and-
conquer method to reduce the required storage space and
execution time. Experimental results show that DBV-ISP
is more efficient than the EISP-Miner algorithm in terms
of execution time and memory usage.
Keywords—BitTable, Dynamic bit vector, Inter-sequence
pattern, Sequence pattern, Vertical data format.
I. INTRODUCTION
Sequential pattern mining from sequence databases is an
important issue in data mining [4, 5, 7, 12]. Many of
algorithms have thus been proposed for mining sequential
patterns in sequence databases [1-2, 5-6, 8-12, 15-16].
However, these algorithms treat sequences independently
without considering the relationship between sequences.
Several algorithms for mining frequent sequential patterns
through lots of transactions in sequence databases produced
the remarkable results. However, these algorithms do not
consider the ordered relationship between items within a
transaction, with the items treated as an unordered set.
Wang and Lee [14] proposed an algorithm for mining inter-
the EISP-Miner algorithm that mines frequent inter-
sequence patterns across several transactions in sequence
databases. The EISP-Miner algorithm considers the items
within several transactions in sequence databases as an
ordered set. Hence, it is more general than existing
algorithms. However, this algorithm consumes a lot of
memory for storing transaction identifiers in a tree and it
requires a lot of time to find extended sequences when
creating new patterns. To solve these two problems, this
study proposes an algorithm that uses a dynamic bit vector
(DBV) data structure to efficiently mine inter-sequence
patterns. The proposed DBV-ISP algorithm uses the
compressed sequence mechanism and a divide-and-conquer
method to reduce the required storage space and execution
time.
II. PRELIMINARY CONCEPTS
Consider a sequence database with a set of items I = {i
1
,
i
2
,…,i
n
}, where i
j
is an item (1jn). A sequence S = <t
1
,
t
2
,…,t
m
> is an ordered list of itemsets, where t
j
is an itemset
for 1jm. A sequence database D = {s
1
, s
2
,…, s
|D|
}, where
|D| is the number of sequences in D and s
i
(1i |D|) is a
transaction in the form <Dat, sequence>, where Dat is a
domain attribute of s
i
used to describe contextual
information by the time. Consider sequence Dat 1 in Table
1. it indicates that customer buys item C, and then items
AB.
Table 1: Sequence database
Let t
1
, t
2
be two Dat values for sequences s
1
and s
2
,
respectively. If t
1
is taken as the reference point, the span
between s
1
and s
2
is defined as [t
2
– t
1
]. Sequence s
2
at
domain attribute t
2
with respect to t
1
is called an extended
sequence (e-sequence) and denoted as s
2
[t
2
– t
1
]. For
Dat Sequence Megasequence (maxspan=1)
1 <C(AB)> <C(AB)>[0]<C(ABC)A[1]
2 <C(ABC)A> <C(ABC)A>[0]<AD>[1]
3 <AD> <AD>[0]<A>[1]
4 <A> <A>[0]<AC>[1]
5 <AC> <AC>[0]<BC>[1]
6 <BC> <BC>[0]<(AB)C>[1]
7 <(AB)C> <(AB)C>[0]<E>[1]
8 <E> <E>[1]
2012 Third International Conference on Innovations in Bio-Inspired Computing and Applications
978-0-7695-4837-1/12 $26.00 © 2012 IEEE
DOI 10.1109/IBICA.2012.31
51