International Journal of Advanced Computer Research (ISSN (print): 2249-7277 ISSN (online): 2277-7970) Volume-4 Number-3 Issue-16 September-2014 741 An Efficient Lattice-Based Approach for Generator Mining Pham Quang Huy 1 , Truong Chi Tin 2 Abstract Mining frequent closed itemsets and theirs corresponding generators seem to be the most effective way to mine frequent itemsets and association rules from large datasets since it helps reduce the risks of low performance, big storage and redundancy. However, generator mining has not been studied as much as frequent closed itemsets mining and it has not reached the ultra- optimization yet. In this paper, we consider the problem of enumerating generators from the lattice of frequent closed itemsets as the problem of “distributing M machines to solve N jobs” in order to introduce a close and legible point of view. From this, it is easy to infer some interesting mathematical results to solve the problem easily. Our proposed algorithm, GDP, can efficiently find all generators in very low complexity without duplicated or useless consideration. Experiments show that our approach is reasonable and effective. Keywords Generator, minimal generator, generator mining, lattice of closed frequent itemsets, lattice-based algorithm, dynamic programming algorithm, parallel algorithm. 1. Introduction Association rule (AR) is known to play an important role in data mining and to have many applications in reality. Association rule mining is usually divided into two sub problems: mining all frequent itemsets (FIs) from data and deriving association rules from those frequent mined ones [1]. However, the number of FIs is often numerous since they grow in exponent of the number of itemsets, therefore, algorithms that directly mine FIs or ARs from data usually face the challenges of performance and storage, as well as the problem of generating duplicated candidates. Manuscript received August 24, 2014. Pham Quang Huy, Department of Mathematics and Informatics, DaLat University, Vietnam. Truong Chi Tin, Department of Mathematics and Informatics, DaLat University, Vietnam. A more effective approach is to mine only the class of all frequent closed itemsets (FCIs) because they are commonly much fewer than the FIs and they are essential information for deriving all FIs as well as all ARs. Indeed, a closed itemset (also called a closure) is the largest itemset among the ones contained in the same set of transactions. Based on FCIs we can partition all FIs or ARs into equivalent classes. Then, together with their corresponding generators, it is possible to non-repeatedly derive all FIs and ARs, without the loss of their support and their confidence [2, 3, 4, 5, 6]. As stated in [7], among the best and well-known FCI mining algorithms, there are Charm [6] and FPClose [8]. Charm’s search space is an IT-tree, in which each node is a pair of itemset and tidset (a list of transaction identifiers containing that itemset). Whereas, the search space of FPClose is the space of FP-trees, with each tree is a compression of a conditional dataset. FCIs can also be mined by analyzing the lattice of concepts (e.g., Titanic algorithm [9]). There are also parallel algorithms for FCI mining, such as PLCM QS [10], AFOPT-close [11]. On the other hand, generators are the minimal itemsets in each class [4, 12]. This definition is equivalent to the term of “minimal generator” in [6]. FCI and its generators are keys to induce all other FIs in their class. For instance, the authors in [2] proposed a structure of the FIs in each class via its closure and generators, allowing generating them quickly without replication. They also help to divide ARs into equivalent classes such that in each class, it is only necessary to mine only the basic rules and the consequent ones can be easily derived along with their support and confidence. For example, in [4], Pasquier et al proposed the basic rules in the form of G C\G, where C is a closure and G is a generator (G C). Zaki [6] mentioned the concept of the most general rule in the form of G {m}, in which G is a generator and m is an item. If G and G {m} have the same closure, they are exact rules; otherwise they are approximate rules. In [2, 5], based on FCIs and their generators, the authors partitioned the class of all ARs into equivalent classes, where each one is presented by a pair of FCIs, [L, S] (with L S). Then,