An Accelerator for Frequent Itemset Mining from Data Streams with Parallel Item Tree Kasho Yamamoto, Tsunaki Sadahisa, Dahoo Kim, Eric S. Fukuda, Tetsuya Asai, and Masato Motomura Graduate School of IST, Hokkaido University Kita 14, Nishi 9, Kita-ku, Sapporo 060-0814, Japan (yamamoto@lalsie.ist.hokudai.ac.jp) Abstract— Frequent itemset mining attempts to find frequent subsets in a transaction database. In this era of big data, demand for frequent itemset min- ing is increasing. Therefore, the combination of fast implementation and low memory consumption, espe- cially for stream data, is needed. In response to this, we optimize an online algorithm, called Skip LC-SS algorithm [1], for hardware. In this paper, we present an efficient architecture based on this algorithm. Keywords—data mining; frequent itemsets; stream process- ing; hardware accelerator I. Introduction Data stream mining for frequent itemsets (DSM-FI) is one of the most important and fundamental challenges of online data stream mining. DSM-FI cannot store an entire stream in memory and perform a single scan be- cause the stream is treated as continuous. Counting the occurrences of all itemsets is also unrealistic due to the constraint of memory capacity. Therefore, substantial research has focused on the one pass approximation al- gorithm [3]. For example, Lossy counting [3] eliminates infrequent itemsets for each transaction. The drawback to this algorithm is that a sudden burst of stream data can produce memory overflow. On the other hand, the space saving [4] and Skip LC-SS algorithms address this memory overflow by fixing the number of stored itemsets. Unfortunately, this process requires substantial memory access and tends to produce memory access bottlenecking during the processing in CPU. We propose a coprocessor architecture that enables the high-speed processing of fre- quent itemset mining (FIM) using FPGA, which does not contain large memory stores but has substantial stores of small memory and can access them simultaneously. II. Background and Preliminary Work A. RELATED WORK FIM implementation using FPGAs has already been studied. Baker et al. [5] proposed the first architecture for Apriori. In 2013, the architecture for Eclat was presented by Zhang et al. [6]. However, these architectures cannot process sudden bursts of stream data. B. SKIP LC-SS ALGORITHM The skip LC-SS algorithm stores a fixed number of itemsets. Therefore, only an amount of O(k) memory space is required. Here, constant k represents the number of stored itemsets. Itemset e is stored using a tuple, such as ⟨e, count(e)⟩, in entry table D, where count(e) is the number of occurrences of e. In order to accelerate the implementation process, this algorithm skips a portion of the process under certain conditions. A brief outline of this baseline algorithm follows. 1. Consider itemset E = {e 1 ,e 2 , ...e 2 |T i | -1 } in the trans- action Ti as follows: (a) if ⟨e i , count(e i )⟩∈ D, count(e i ) + = 1, (b) else if |D| <k, store new entry ⟨e i , 1⟩ in D, (c) else register e i as replacement candidate set (cs). 2. Replace me with cs, written ⟨c, Δ(i)+1⟩, where c is in cs and Δ is the error count Δ(1) = 0. 3. Update Δ(i + 1) as follows: A: |me| > |cs|, Δ(i + 1) = count(cs(i)), B: |me|≤|cs|, Δ(i + 1) = Δ(i)+1. Example Fig.1 shows how to update the entry table using this algorithm. If i = 3, the replacement target can be freely selected from me. If i = 4, although cs is greater than ms, only one element of cs must be replaced. Itemset count Itemset count Itemset count i = 1 Δ(1) = 0 a a,b b 1 1 1 a b a,b -> a,c c 1 1(Δ (2)+1) 1 cs : {a,c} me:{b},{c},{a,b} i = 2 Δ(2) = 0 2 i = 3 Δ(3) = 1 a,c a c 2 2 3 2(Δ (3)+1) cs : {d},{a,d},{c,d},{a,c,d} me : {b} b -> d Fig. 1. Entry table update from stream S consists of three transactions: {a, b}, {a, c}, and {a, c, d}, with k = 4.