I.J. Information Technology and Computer Science, 2015, 07, 77-89
Published Online June 2015 in MECS (http://www.mecs-press.org/)
DOI: 10.5815/ijitcs.2015.07.09
Copyright © 2015 MECS I.J. Information Technology and Computer Science, 2015, 07, 77-89
Mining Sequential Patterns from mFUSP - Tree
Ashin Ara Bithi
Asian University of Bangladesh, Dhaka, Bangladesh
E-mail: ashincse@yahoo.com
Abu Ahmed Ferdaus
University of Dhaka, Dhaka, Bangladesh
E-mail: ferdaus1167@gmail.com
Abstract—Mining sequential patterns from sequence database
has consequential responsibility in the data mining region as it
can find the association from the ordered list of events. Mining
methods that predicated on the pattern growth approach, such as
PrefixSpan, are well-organized enough to denude the sequential
patterns, but engendering a projection database for each pattern
regards as bottleneck of these methods. Lin (2008) first
commenced the concept of tree structure to sequential pattern
mining, which is acknowledged as Fast updated sequential
pattern tree (FUSP - tree). However, link information stored in
each node of FUSP - tree structure increases the complication of
this method due to its link updating process. In this paper, at
first, we have proposed a modified fast updated sequential
pattern tree (called a mFUSP - tree) arrangement for storing the
complete set of sequences with just frequent items, their
frequencies and their relations among items in the given
sequence into a compact data structure; excluding this tree
structure avoids storing link information along to the next node
of the following branch in the tree that carries the same item.
Afterward, we have established by a mining method that our
mFUSP - tree structure is proficient enough to ascertain out the
perfect set of frequent sequential patterns from sequence
databases without generating any intermediate projected tree
and without calling for repeated scanning of the original
database during mining. Our experimental result proves that, the
performance of our proposed mFUSP - tree mining approach is
a lot more trustworthy than other existing algorithms like GSP,
PrefixSpan and FUSP - tree based mining.
Index Terms— Intermediate Projected Tree, Projection
Database, Sequential Pattern Mining, Frequent Pattern,
Sequence Database, Tree - Based Mining.
I. INTRODUCTION
Data mining (sometimes called data or knowledge
discovery) is the process of examining data to distill
useful information and helpful knowledge from large
databases. This information may assist us to reach a
determination. Mining useful information and helpful
knowledge from large databases has evolved into an
important research field in data mining arena. Among
them, sequential pattern mining in large transactional
databases plays an important part in this area. Sequential
pattern mining is the procedure of obtaining the complete
set of frequent occurring ordered events or subsequences
from a set of sequences or sequence database. The
advantage to find the sequential patterns is, we can see
the customer's sequences and predict the probability to
purchase some items in next transactions by the clients.
For instance, if a customer bought egg and sugar in one
transaction, then, we can predict the probability to buy
milk by this customer in the next: that is, if {egg, sugar}
then {milk}. It is widely applied in the analysis of
customer purchase patterns or web access patterns,
sequencing or time-related processes such as science
experiments, natural disasters, and in DNA sequences,
and so on. Agrawal and Srikant first introduced
sequential pattern mining in 1995 [1]. Based on their
study, sequential pattern mining is stated as follows:
“Given a sequence database or a set of sequences where
each sequence is an ordered list events or elements and
each event or element is a set of items, and given a user-
specific minimum support threshold or min_sup,
sequential pattern mining is the process of finding the
complete set of frequent subsequences, that is, the
subsequences whose occurrence frequency in the set of
sequences or sequence databases is greater than or equal
to min_sup.” Past studies developed two major classes of
sequential pattern mining methods; one class proposed
apriori based mining algorithms and another class
proposed pattern growth based mining methods. GSP
(Generalized Sequential Pattern) [2] is an apriori based
algorithm which can determine the complete set of
frequent sequential patterns by using point-wise
candidate sequences generation and test access. This
algorithm scans the whole sequence database multiple
times to find out the support count or frequency of each
pattern from the database. As a result of multiple
scanning, the complexity of GSP algorithm gradually
increases with large database. PrefixSpan [3] is a pattern
growth based approach which is similar to FP-growth [4].
It does not make a great number of useless candidate sets
that makes out apriori based method. But, to see the
sequential patterns, PrefixSpan recursively creates a
circle of small projected databases from large databases.
To do this, the algorithm first scans the original database
to get the frequent items and their corresponding counts,
and then, it starts the mining operation. In mining process,
it first finds the subsequences for every prefix i.e.
frequent items. After this, the algorithm finds the
sequential patterns from the projected databases which
are produced from each prefix sequence and then, it
recursively creates set of small projected databases for
every frequent subsequence. In this approach, the
sequences grow from short to large with recursively