I.J. Information Technology and Computer Science, 2015, 07, 77-89 Published Online June 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2015.07.09 Copyright © 2015 MECS I.J. Information Technology and Computer Science, 2015, 07, 77-89 Mining Sequential Patterns from mFUSP - Tree Ashin Ara Bithi Asian University of Bangladesh, Dhaka, Bangladesh E-mail: ashincse@yahoo.com Abu Ahmed Ferdaus University of Dhaka, Dhaka, Bangladesh E-mail: ferdaus1167@gmail.com AbstractMining sequential patterns from sequence database has consequential responsibility in the data mining region as it can find the association from the ordered list of events. Mining methods that predicated on the pattern growth approach, such as PrefixSpan, are well-organized enough to denude the sequential patterns, but engendering a projection database for each pattern regards as bottleneck of these methods. Lin (2008) first commenced the concept of tree structure to sequential pattern mining, which is acknowledged as Fast updated sequential pattern tree (FUSP - tree). However, link information stored in each node of FUSP - tree structure increases the complication of this method due to its link updating process. In this paper, at first, we have proposed a modified fast updated sequential pattern tree (called a mFUSP - tree) arrangement for storing the complete set of sequences with just frequent items, their frequencies and their relations among items in the given sequence into a compact data structure; excluding this tree structure avoids storing link information along to the next node of the following branch in the tree that carries the same item. Afterward, we have established by a mining method that our mFUSP - tree structure is proficient enough to ascertain out the perfect set of frequent sequential patterns from sequence databases without generating any intermediate projected tree and without calling for repeated scanning of the original database during mining. Our experimental result proves that, the performance of our proposed mFUSP - tree mining approach is a lot more trustworthy than other existing algorithms like GSP, PrefixSpan and FUSP - tree based mining. Index TermsIntermediate Projected Tree, Projection Database, Sequential Pattern Mining, Frequent Pattern, Sequence Database, Tree - Based Mining. I. INTRODUCTION Data mining (sometimes called data or knowledge discovery) is the process of examining data to distill useful information and helpful knowledge from large databases. This information may assist us to reach a determination. Mining useful information and helpful knowledge from large databases has evolved into an important research field in data mining arena. Among them, sequential pattern mining in large transactional databases plays an important part in this area. Sequential pattern mining is the procedure of obtaining the complete set of frequent occurring ordered events or subsequences from a set of sequences or sequence database. The advantage to find the sequential patterns is, we can see the customer's sequences and predict the probability to purchase some items in next transactions by the clients. For instance, if a customer bought egg and sugar in one transaction, then, we can predict the probability to buy milk by this customer in the next: that is, if {egg, sugar} then {milk}. It is widely applied in the analysis of customer purchase patterns or web access patterns, sequencing or time-related processes such as science experiments, natural disasters, and in DNA sequences, and so on. Agrawal and Srikant first introduced sequential pattern mining in 1995 [1]. Based on their study, sequential pattern mining is stated as follows: “Given a sequence database or a set of sequences where each sequence is an ordered list events or elements and each event or element is a set of items, and given a user- specific minimum support threshold or min_sup, sequential pattern mining is the process of finding the complete set of frequent subsequences, that is, the subsequences whose occurrence frequency in the set of sequences or sequence databases is greater than or equal to min_sup.” Past studies developed two major classes of sequential pattern mining methods; one class proposed apriori based mining algorithms and another class proposed pattern growth based mining methods. GSP (Generalized Sequential Pattern) [2] is an apriori based algorithm which can determine the complete set of frequent sequential patterns by using point-wise candidate sequences generation and test access. This algorithm scans the whole sequence database multiple times to find out the support count or frequency of each pattern from the database. As a result of multiple scanning, the complexity of GSP algorithm gradually increases with large database. PrefixSpan [3] is a pattern growth based approach which is similar to FP-growth [4]. It does not make a great number of useless candidate sets that makes out apriori based method. But, to see the sequential patterns, PrefixSpan recursively creates a circle of small projected databases from large databases. To do this, the algorithm first scans the original database to get the frequent items and their corresponding counts, and then, it starts the mining operation. In mining process, it first finds the subsequences for every prefix i.e. frequent items. After this, the algorithm finds the sequential patterns from the projected databases which are produced from each prefix sequence and then, it recursively creates set of small projected databases for every frequent subsequence. In this approach, the sequences grow from short to large with recursively