International Journal of Innovative Research in Engineering & Management (IJIREM) ISSN: 2350-0557, Volume-3, Issue-1, January-2016 55 An Efficient Implementation of an Algorithm for Mining Locally Frequent Patterns Fokrul Alom Mazarbhuiya College of Computer Science & IT, Albaha University, Albaha, KSA fokrul_2005@yahoo.com ABSTRACT Mining patterns from large dataset is an interested data mining problem. Many methods have been developed for this purpose till today. Most of the methods considered the time attributes as one of the normal attribute. However taking the time attribute into account separately the patterns can be extracted which cannot be extracted by normal methods. These patterns are termed as temporal patterns A couple of works have already been done in mining temporal patterns. A nice algorithm for mining locally frequent patterns from temporal datasets is proposed by Anjana et al. In this article, we propose a hash-tree based implementation of the algorithm. We also established the fact that the hash-tree based outperforms others. Keywords Data Mining, Frequent patterns, Temporal patterns, Locally frequent patterns. 1. INTRODUCTION The problem frequent item set mining is well- researched field of data mining and is associated with association rule mining in market basket data [1]. There are a number of algorithms proposed till today for mining such datasets. A-priori algorithm [2], is one of the most popular algorithms. But market basket data are usually temporal in nature e.g. when a transaction happens the time of transaction is also recorded with the transaction. Considering the time features of such datasets, some interesting patterns can be extracted which otherwise cannot be extracted. In [3], Ale et al devised a method of extracting association rules which hold throughout the life-time of an itemset where the life- time of an item set is defined as the time period between the first transaction and last transaction containing the item set and it may not be same as that of dataset. Although the algorithm [3] extracts much more rules than normal A-priori algorithm, it has some limitations. For example, the method works well if the items are uniformly distributed in the transactions throughout its life-time. But in practice there may exist items, which may not be uniformly distributed in the transactions throughout their lifetime e.g. seasonal items like cold drinks. For such items if the items life- time is taken into consideration, they may not turn out to be frequent because of the large time period when the item is absent in the transaction or even if the items are frequent then they will have very small support value. Considering the seasonal behavior of certain items in the transactions, B. Ozden et al [4] put forward a method of finding cyclic association rules where they proposed algorithms to extract all such rules holding within a user-specified time period. Once the user chooses the time period it will be fixed throughout the execution of the algorithm. In [5, 6], authors tried to address the above limitations. The frequent itemsets extracted by [5, 6] is known as locally frequent itemsets. In this paper, our discussions are mostly emphasized on the implementation side of [5, 6]. We propose here a hash-based implementation of the algorithm [5, 6]. The nicety about the hash-tree based implementation is that it reduces the number of comparisons and store the candidate in a hash-tree. This paper is organized as follows. In section-2, we discuss about definitions and notations used in [5, 6]. In section-3, we discuss the algorithm of [5, 6]. In section-4, we discuss about the detailed implementation. Conclusion and Lines for future works are discussed in section-5. 2. TERMS, DEFINITIONS AND NOTATIONS USED Let Let T = <to, t1…> be a sequence of time-stamps over which a linear ordering < is defined where ti < tj means ti denotes a time which is earlier than tj. Let I denote a finite set of items and the transaction dataset D is a collection of transactions where each transaction has a part which is a subset of the item set I and the other part is a time-stamp indicating the time in which the transaction had taken place. We assume that D is ordered in the ascending order of the time-stamps. For time intervals we always consider closed intervals of the form [t1, t2] where t1 and t2 are time-stamps. We say that a transaction is in the time interval [t1, t2] if the time-stamp of the transaction say t is such that t1  t  t2. We define the local support of an item set in a time interval [t1, t2] as the ratio of the number of transactions in the time interval [t1, t2] containing the item set to the total number of transactions in [t1, t2] for the whole dataset D. We use the notation 1 2 [ , ] t t Supp (X) to denote the support of the item set X in the time interval [t1, t2]. Given a threshold  we say that an item set X is frequent in the time interval [t1, t2] if 1 2 [ , ] t t Supp (X)  (/100)* tc where tc denotes the total number of transactions in D that are in the time interval [t1, t2]. 3. FINDING LOCALLY FREQUENT ITEMSETS WITH ASSOCIATED TIME INTERVALS Here for the sake of convenience, we discuss the algorithm used in [5, 6] for finding locally frequent itemsets. While constructing locally frequent sets, with each locally frequent set a list of time-intervals is constructed in which the set is frequent. Two thresholds minthd1 and minthd2 are used and these are given as input. During execution, while making a pass through the database, if for a particular item set the time gap between its