Quasi-perfect Hashing ZBIGNIEW J. CZECH Institute of Computer Science, Silesia University, 41–200 Sosnowiec, Poland Email: zjc@silesia.pl The idea of quasi-perfect hashing is introduced and applied to solve the static dictionary problem. Given a universe U and a set S of n distinct keys belonging to U , we propose a quasi-perfect hash function which allows one to find a key from S, stored in the hash table of size m, m n, in O(1) time. While looking up a key at most two probes in the hash table are made. Our main motivation is to minimize the memory requirement for representing the hashing scheme, retaining a high probability of finding quasi-perfect hash functions for arbitrary sets S. If we compare the method of quasi-perfect hashing to Fredman, Koml´ os and Szemer´ edi’s two-level hashing for the bounded universe U , we find that it is superior with regard to both space and speed. Received May 4, 1998; revised October 1, 1998 1. INTRODUCTION Given a set S of n distinct keys belonging to a universe U ={1, 2, ... , u 1}, we would like to store the keys of S in some data structure so that the membership queries of the form ‘Is x in S?’ can be answered quickly. This searching problem, also called the dictionary problem, is ubiquitous in computer science applications. If no deletion and insertion of elements in S occurs, then the dictionary problem is called static. Perfect hashing is one of the best methods to solve this problem. An overview of perfect hashing is given in [1, 2] and the area is surveyed in [3]. There are several approaches to construct perfect hash functions (PHF). Fredman, Koml´ os and Szemer´ edi (FKS) [4] proposed a two-level hashing scheme based on segmentation. Although the FKS scheme can be constructed for arbitrary n and u , its memory requirement is relatively high. In this paper we propose quasi-perfect hashing as a novel approach to solve the static dictionary problem. Our main motivation is to minimize the memory requirement for representing the hashing scheme, retaining a high probability of finding quasi-perfect hash functions for arbitrary sets S. If we compare the method of quasi-perfect hashing to the FKS approach for the bounded universe U , we find that it is superior with regard to both space and speed. The rest of the paper is organized as follows. Section 2 contains the basic definitions regarding perfect hashing. In Section 3 we present one of the approaches to construct perfect hash functions called segmentation. Section 4 introduces the idea of quasi-perfect hashing. In Section 5 we discuss the problem of finding quasi-perfect hash functions. Section 6 concludes the work. 2. PERFECT HASHING Let U ={1, 2,..., u 1} be the universe for some positive integer u . For convenience we assume that u is a prime number. Let S be a set of n distinct elements, or keys, belonging to U . A hash function is a function h : U M that maps the keys from S into some given interval of integers M, say [0, m 1]. Given a key x S, the hash function computes an address, i.e. an integer in [0, m 1], for the storage or retrieval of x . The storage area used to store keys is known as a hash table. Keys for which the same address is computed are called synonyms. Due to the existence of synonyms, a situation called collision may arise, in which two different keys have the same address. A perfect, or 1-probe, hash function (PHF) for S is an injection h : U →[0, m 1], i.e. for all keys x , y S such that x = y we have h (x ) = h ( y ), which implies that m n. If m = n and h is perfect, then we say that h is a minimal perfect hash function (MPHF). It follows from the definition that a PHF transforms each key of S into a unique address in the hash table. Since no collisions occur, each key can be retrieved from the table in a single probe. 3. PERFECT HASHING WITH SEGMENTATION An important approach to construct a PHF is segmentation. It divides an input set of keys into a number of subsets. For each subset a PHF is determined separately. One of the methods of finding PHFs based on segmen- tation was proposed by Fredman et al. [4]. The method comprises two steps. First, given a set S of n distinct keys belonging to the universe U , a partition of S into n subsets B i , i = 0, 1,..., n 1, of size b i =| B i | is obtained by using a primary hash function h : x ((ax ) mod u ) mod n where a ∈[1, u 1] is a multiplier, and u is assumed to be prime. The subsets are also called collision buckets. Then for each collision bucket a secondary perfect hash function is constructed, of the form h i : x ((a i x ) mod u ) mod c i THE COMPUTER J OURNAL, Vol. 41, No. 6, 1998