Two-tier Bloom ﬁlter to achieve faster membership testing M. Jimeno, K.J. Christensen and A. Roginsky Testing for element membership in a Bloom ﬁlter requires hashing of a test element (e.g. a string) and multiple lookups in memory. A design of a new two-tier Bloom ﬁlter with on-chip hash functions and cache is described. For elements with a heavy-tailed distribution for popularity, membership testing time can be signiﬁcantly reduced. Introduction: Bloom ﬁlters [1] are a space-efﬁcient, probabilistic data structure for representing a list of elements (for example, a list of strings). A Bloom ﬁlter is an array of m bits. A string is mapped into a Bloom ﬁlter by inputting it to a group of k hash functions resulting in k array positions. Each indexed array position is set to 1. A string is tested for membership by inputting it to the same group of k hash functions. If all k generated array positions are determined to be set to 1, then the string is probably a member. False positives can occur with Pr [false positive] ¼ (1 2 1/m) kn for n elements mapped into a Bloom ﬁlter. Bloom ﬁlters are widely used with many applications in the domain of networks [2]. One application of interest is representing large ﬁle lists; for example lists of shared ﬁles in servers or caches to enable determi- nation if a given ﬁle name is in a list of shared ﬁles. The key perform- ance measures for a Bloom ﬁlter are membership testing time (also called computation time in [3]), memory requirements, and probability of false positive. Membership testing time is the time to determine if an element belongs to the set represented by the Bloom ﬁlter. The key motivation for using Bloom ﬁlters (over more conventional data struc- tures) is reduced memory requirement and faster membership testing [3]. Membership testing time is a function of the time to (a) compute up to k hashes and (b) perform up to k lookups in memory where the Bloom ﬁlter array is stored. Memory lookup times depend on the type of memory in which the Bloom ﬁlter is stored (e.g. high-speed localised static random access memory (SRAM) or slower main memory dynamic random access memory (DRAM). In this Letter, the membership testing time for a Bloom ﬁlter is reduced by implementing hashing directly in specialised hardware and by introducing a second tier cache Bloom ﬁlter to reduce the number of accesses required into slower main memory. element hashing circuitry cache hit cache Bloom filter hashes main hit Fig. 1 Design of two-tier Bloom ﬁlter If cache hit is false then hashes are used for lookup in external Bloom ﬁlter in main memory. If main hit is true then cache learns the element Two-tier Bloom ﬁlter: A two-tier Bloom ﬁlter design is proposed to reduce membership test time. Fig. 1 shows the basic design of a single component (such as a specialised chip) containing hashing circui- try and a Bloom ﬁlter. The two tiers might exist in a hardware implemen- tation of a Bloom ﬁlter, or between processor cache and main memory in a computer. An on-chip Bloom ﬁlter can be implemented using fast SRAM, but it is limited in size to about m ¼ 4 Mb [4]. Using m/n ¼ 32 and k ¼ 22 (as recommended in [5]) to achieve a low probability of false positive, the number of elements that can be mapped into the on-chip Bloom ﬁlter is n ¼ 131 072. For applications with more than 131 072 elements, the on-chip Bloom ﬁlter can be used as a cache for a larger off-chip Bloom ﬁlter stored in the main memory (typically implemented in DRAM, with slower lookup time than SRAM) of a system. The two-tier Bloom ﬁlter takes as input the element (e.g. a string) to be mapped into, or tested for membership, and outputs the k hash values and a single test output to indicate if the element being tested for was found (or ‘hit’) in the on-chip cache Bloom ﬁlter. If the cache hit output is false, then the computed hash values are used to test the external Bloom ﬁlter in main memory. If the element is found in the external Bloom ﬁlter the main hit input causes the cache Bloom ﬁlter to learn the element (if the cache is not yet full). Thus, the cache Bloom ﬁlter contains a subset of the elements mapped into the main memory Bloom ﬁlter (the main memory Bloom ﬁlter contains mappings for all elements). How well the cache learns the most popular elements determines the performance (measured in membership testing time) of the two-tier Bloom ﬁlter. Membership testing time (T test ) is a function of hashing time (t hash ), cache Bloom ﬁlter testing time (t cache ), main memory Bloom ﬁlter testing time (t main ), and probability of a successful membership test in the cache Bloom ﬁlter ( p cache ). The membership testing time is T test ¼ t hash þ t cache þ (1 2 p cache )t main for the two-tier Bloom ﬁlter and T test ¼ t hash þ t main for a single Bloom ﬁlter in main memory. In order for the two-tier Bloom ﬁlter to have a smaller membership testing time than that of a single Bloom ﬁlter in main memory, t cache 2 p cache t main , 0 must hold. The speed-up (S) of T test is the ratio of the time required by the two-tier Bloom ﬁlter divided by the time required by a single Bloom ﬁlter stored in main memory. The speed- up expresses the relative (percentage) reduction in membership testing time by using the two-tier Bloom ﬁlter. Speed-up is S ¼ t hash þ t main t hash þ t cache þð1  p cache Þt main ð1Þ Application to ﬁle search: The speciﬁc target application for the two- tier Bloom ﬁlter is membership testing for a ﬁle system containing millions of ﬁles, each ﬁle with a unique identiﬁer (e.g. path plus ﬁle name). It is well known that the distribution of the requests for ﬁles in some applications such as P2P (peer to peer) ﬁle sharing and web caching follow a Zipf-like distribution [6, 7] where the probability of requesting the element ranked jth in popularity among a population of N elements is Pr[ j ] ¼ V/j a , where V is the normalisation constant and a is the shape parameter. The normalisation constant is the inverse sum of 1/j a for j ¼ 1, 2,... , N. For a ¼ 0 the distribution is uniform and as a increases the distribution becomes skewed and heavy tailed. For P2P ﬁle sharing, a values between 0.60 and 0.83 have been measured [6]. Performance evaluation: In this performance evaluation we use simu- lation to study the speed-up, S, of the two-tier Bloom ﬁlter compared to a single Bloom ﬁlter stored in main memory. The speed-up of the two-tier Bloom ﬁlter for a stream of Zipf distributed membership tests is a function of t hash , t cache , t main and p cache (the subscripts cache and main describe the parameter as applying to the cache and main memory Bloom ﬁlters, respectively). The probability p cache is a function of the size of the cache (n cache ), the popularity of the elements (modelled here with a Zipf distribution with parameters n main ¼ N and a), and how well the cache learns the n cache most popular elements. If the cache Bloom ﬁlter learns (and thus represents) the n cache most popular elements – called a ‘perfect cache’ in this Letter – for the population of N elements, then the probability of a cache hit is the cumulative prob- ability mass of elements 1, 2, ... , n cache : p cache ¼ P ncache j¼1 V j a ð2Þ In reality, the cache will learn a set of elements that are less than perfect. This is called the ‘realistic cache’ in this Letter and its cumulative prob- ability mass can be computed as follows. Given N distinct elements that are sampled with replacement, let x j be the probability of drawing an element of type j. Sampling continues until M elements of different types are sampled (this corresponds to the cache being fully loaded). Here M is n cache and N is n main . The cumulative probability, T, corre- sponding to p cache for a realistic cache is T ¼ X x u1 þ x u2 þþ x uM À Á  Pr subset G ½ is sampled first of all M subsets of N types ð3Þ where G is the subset u 1 ; u 2 ; ... ; u M f g and where the summation is taken over all M subsets of N types of elements. This summation is likely to be intractable to compute given the very large number of subsets possible for a large N. Given this intractability, a simulation model of the two-tier Bloom ﬁlter was created. From this simulation model p cache for a realistic cache could be experimentally estimated. ELECTRONICS LETTERS 27th March 2008 Vol. 44 No. 7