Two-tier Bloom filter to achieve faster membership testing M. Jimeno, K.J. Christensen and A. Roginsky Testing for element membership in a Bloom filter requires hashing of a test element (e.g. a string) and multiple lookups in memory. A design of a new two-tier Bloom filter with on-chip hash functions and cache is described. For elements with a heavy-tailed distribution for popularity, membership testing time can be significantly reduced. Introduction: Bloom filters [1] are a space-efficient, probabilistic data structure for representing a list of elements (for example, a list of strings). A Bloom filter is an array of m bits. A string is mapped into a Bloom filter by inputting it to a group of k hash functions resulting in k array positions. Each indexed array position is set to 1. A string is tested for membership by inputting it to the same group of k hash functions. If all k generated array positions are determined to be set to 1, then the string is probably a member. False positives can occur with Pr [false positive] ¼ (1 2 1/m) kn for n elements mapped into a Bloom filter. Bloom filters are widely used with many applications in the domain of networks [2]. One application of interest is representing large file lists; for example lists of shared files in servers or caches to enable determi- nation if a given file name is in a list of shared files. The key perform- ance measures for a Bloom filter are membership testing time (also called computation time in [3]), memory requirements, and probability of false positive. Membership testing time is the time to determine if an element belongs to the set represented by the Bloom filter. The key motivation for using Bloom filters (over more conventional data struc- tures) is reduced memory requirement and faster membership testing [3]. Membership testing time is a function of the time to (a) compute up to k hashes and (b) perform up to k lookups in memory where the Bloom filter array is stored. Memory lookup times depend on the type of memory in which the Bloom filter is stored (e.g. high-speed localised static random access memory (SRAM) or slower main memory dynamic random access memory (DRAM). In this Letter, the membership testing time for a Bloom filter is reduced by implementing hashing directly in specialised hardware and by introducing a second tier cache Bloom filter to reduce the number of accesses required into slower main memory. element hashing circuitry cache hit cache Bloom filter hashes main hit Fig. 1 Design of two-tier Bloom filter If cache hit is false then hashes are used for lookup in external Bloom filter in main memory. If main hit is true then cache learns the element Two-tier Bloom filter: A two-tier Bloom filter design is proposed to reduce membership test time. Fig. 1 shows the basic design of a single component (such as a specialised chip) containing hashing circui- try and a Bloom filter. The two tiers might exist in a hardware implemen- tation of a Bloom filter, or between processor cache and main memory in a computer. An on-chip Bloom filter can be implemented using fast SRAM, but it is limited in size to about m ¼ 4 Mb [4]. Using m/n ¼ 32 and k ¼ 22 (as recommended in [5]) to achieve a low probability of false positive, the number of elements that can be mapped into the on-chip Bloom filter is n ¼ 131 072. For applications with more than 131 072 elements, the on-chip Bloom filter can be used as a cache for a larger off-chip Bloom filter stored in the main memory (typically implemented in DRAM, with slower lookup time than SRAM) of a system. The two-tier Bloom filter takes as input the element (e.g. a string) to be mapped into, or tested for membership, and outputs the k hash values and a single test output to indicate if the element being tested for was found (or ‘hit’) in the on-chip cache Bloom filter. If the cache hit output is false, then the computed hash values are used to test the external Bloom filter in main memory. If the element is found in the external Bloom filter the main hit input causes the cache Bloom filter to learn the element (if the cache is not yet full). Thus, the cache Bloom filter contains a subset of the elements mapped into the main memory Bloom filter (the main memory Bloom filter contains mappings for all elements). How well the cache learns the most popular elements determines the performance (measured in membership testing time) of the two-tier Bloom filter. Membership testing time (T test ) is a function of hashing time (t hash ), cache Bloom filter testing time (t cache ), main memory Bloom filter testing time (t main ), and probability of a successful membership test in the cache Bloom filter ( p cache ). The membership testing time is T test ¼ t hash þ t cache þ (1 2 p cache )t main for the two-tier Bloom filter and T test ¼ t hash þ t main for a single Bloom filter in main memory. In order for the two-tier Bloom filter to have a smaller membership testing time than that of a single Bloom filter in main memory, t cache 2 p cache t main , 0 must hold. The speed-up (S) of T test is the ratio of the time required by the two-tier Bloom filter divided by the time required by a single Bloom filter stored in main memory. The speed- up expresses the relative (percentage) reduction in membership testing time by using the two-tier Bloom filter. Speed-up is S ¼ t hash þ t main t hash þ t cache þð1 p cache Þt main ð1Þ Application to file search: The specific target application for the two- tier Bloom filter is membership testing for a file system containing millions of files, each file with a unique identifier (e.g. path plus file name). It is well known that the distribution of the requests for files in some applications such as P2P (peer to peer) file sharing and web caching follow a Zipf-like distribution [6, 7] where the probability of requesting the element ranked jth in popularity among a population of N elements is Pr[ j ] ¼ V/j a , where V is the normalisation constant and a is the shape parameter. The normalisation constant is the inverse sum of 1/j a for j ¼ 1, 2,... , N. For a ¼ 0 the distribution is uniform and as a increases the distribution becomes skewed and heavy tailed. For P2P file sharing, a values between 0.60 and 0.83 have been measured [6]. Performance evaluation: In this performance evaluation we use simu- lation to study the speed-up, S, of the two-tier Bloom filter compared to a single Bloom filter stored in main memory. The speed-up of the two-tier Bloom filter for a stream of Zipf distributed membership tests is a function of t hash , t cache , t main and p cache (the subscripts cache and main describe the parameter as applying to the cache and main memory Bloom filters, respectively). The probability p cache is a function of the size of the cache (n cache ), the popularity of the elements (modelled here with a Zipf distribution with parameters n main ¼ N and a), and how well the cache learns the n cache most popular elements. If the cache Bloom filter learns (and thus represents) the n cache most popular elements – called a ‘perfect cache’ in this Letter – for the population of N elements, then the probability of a cache hit is the cumulative prob- ability mass of elements 1, 2, ... , n cache : p cache ¼ P ncache j¼1 V j a ð2Þ In reality, the cache will learn a set of elements that are less than perfect. This is called the ‘realistic cache’ in this Letter and its cumulative prob- ability mass can be computed as follows. Given N distinct elements that are sampled with replacement, let x j be the probability of drawing an element of type j. Sampling continues until M elements of different types are sampled (this corresponds to the cache being fully loaded). Here M is n cache and N is n main . The cumulative probability, T, corre- sponding to p cache for a realistic cache is T ¼ X x u1 þ x u2 þþ x uM À Á Pr subset G ½ is sampled first of all M subsets of N types ð3Þ where G is the subset u 1 ; u 2 ; ... ; u M f g and where the summation is taken over all M subsets of N types of elements. This summation is likely to be intractable to compute given the very large number of subsets possible for a large N. Given this intractability, a simulation model of the two-tier Bloom filter was created. From this simulation model p cache for a realistic cache could be experimentally estimated. ELECTRONICS LETTERS 27th March 2008 Vol. 44 No. 7