A binary based approach for generating association rules Med El Hadi Benelhadj 1 , Khedija Arour 2 , Mahmoud Boufaida 1 and Yahya Slimani 3 1 LIRE Laboratory, Computer Science Department, Mentouri University, Constantine, Algeria 2 National Institute of Applied Science and Technology, Tunis, Tunisia 3 Computer Science Department, Faculty of Sciences, Tunis, Tunisia Abstract - Advanced database application areas, such as computer aided design, office automation, digital libraries, data-mining, hypertext and multimedia systems, need to handle complex data structures with set-valued attributes. These information systems contain implicit data that will be necessary to extract and exploit, by using data mining techniques. To exploit the data from these systems, the choice of appropriate storage structures becomes essential. In this paper, we propose a new compact structure to represent a transactions database, called a signatures tree, to speed up the signature file scanning. The construction of this tree requires only one single access to the transactions database. This tree will be used later to compute maximum support, extract frequent itemsets and generate association rules. Keywords: Data Mining, Frequent itemset, Signature file, Signature tree. 1 Introduction Extracting Knowledge from Databases involves the extraction of implicit information, unknown and potentially useful, stored in large databases. The amounts of data collected are becoming increasingly important and their analysis more tedious. Data mining is an essential step in a KDD process. The efficient search of information in large databases to extract knowledge from the contributing to a decision is vital for any expert. Several methods and techniques are used in KDD process to extract knowledge from large databases. Mining association rules which trends to find interesting association or correlation relationships among large amounts of data is one of these techniques. An association rule R is defined as an implication of the form R: S → T such that S ⊂ I and T ⊂ I and S ∩ T = Ø, I being a set of items. This generation of association rules involves two steps: 1. The extraction of frequent itemsets (with support ≥ Minsup), 2. The generation of association rules (with confidence ≥ Minconf). Using this technique, we can generate, from a set of transactions, the frequent itemsets (itemsets with support above a minimum fixed by the user) and then the association rules from these itemsets. The first step is the most expensive with high demands for computation and data access [1] [3]. Because of that, we focus our attention in this paper on the frequent counting. Our proposition consists to adopt a binary approach for generating frequent itemsets. Transaction database are represented by using a new compact data structure based on tree signatures. The use of signatures provides a low cost of storage and a speed of binary operations. Several algorithms that generate association rules are based on two sub-steps "Generate" and "Verify" such as Apriori [1]. However, this phase is the most expensive, because of multiple access to transactions database. Other algorithms have tried to improve the algorithm Apriori. The partition algorithm [10] divides the transaction database into partitions, which increases the number of locally frequent itemsets that are globally rare, thus generating a loss of time doing redundant computation [13]. However, the algorithm Dynamic Itemset Counting [4] is a generalization of the Apriori algorithm. FP-Growth [9] extracts the frequent itemsets without generating candidate itemsets. It is based on a FP-Tree structure, which requires a complete reconstruction of the FP-Tree, for each updating. DFPMT-A [12] mining frequent itemsets are based on Apriori algorithm and uses dynamic approach like Longest Common Subsequence. In order to compute the support of a collection of itemsets, it is necessary to access to the transaction database. As the transaction database is generally large, a solution for avoiding repetitive and costly access is to represent it using compact structures. As an example of these compact structures, we can mention: BitMap [8], FP-Tree [9], Patricia tree [11], Transposed Form [12] and so on. Standard data structures cannot provide scalability, in terms of the data size and the performance for large databases, we must rely to adopt a binary and compact structure to improve performance and search space. In this paper, we propose an approach using a binary tree structure to represent the transaction database. Each transaction is represented by a binary signature. The set of signatures is a signature file which is represented as a signature tree. In the process of generating frequent itemsets, a signature S I is associated with each itemset I and is constructed with the same way that the selected transaction signatures. Each S I is associated with the identifier of transaction (Tid), which generating S I . This process constructs the signatures transaction tree, a compact structure, finds all the frequent itemsets, based on a maximum support, and une only one access to transactions database. The reminder of the paper is organized as follows: Section 2 gives an overview of the concept of tree signature. Section 3 presents our proposed structure called STT (Signature