DISTRIBUTED LINEAR HASHING AND PARALLEL PROJECTION IN MAIN MEMORY DATABASES C. Severance, S. Pramanik & P. Wolberg Computer ScienceDepartment,Michigan StateUniversity East Lansing, Michigan 48824 ABSTRACT This paper extends the concepts of the distributed linear hashed main memory file system with the objective of supporting higher level parallel dambase operations. The basic distributed linear hashing technique provides a high speed hash baseddynamic file systemon a NUMA atchi- tecture multi-processor system.Distributed linear hashing has been extended to include the ability to perform high speed parallel scans of the hashed file. The fast scan feature provides load balancing to compensate for uneven distributions of records and uneven processing speed amongdifferent processors. These extensions are used to implement a parallel projection capability. The perfor- mance of distributed linear hashing and parallel projec- tion is investigated. 1. INTRODUCTION The availability of multi-processor computers with large main memories has made main memory database applications feasible. With a large number of processing nodes,these systems can have a large amount of memory at relatively low cost. The aggregate data transfer rate between the memories and the processing nodes is also very large for thesesystems. While the aggregate performanceand memory size of these systems is very high, the central control struc- tures usedin traditional database systems will prevent the system from achieving high levels of performance. Dis- tributed linear hashing provides a technique for imple- menting parallel main memory databasesystems which minimize the adverse effect of these architectural con- straints, while exploiting the NUMA architecture to enhance performance of key based accessto individual records storedin a hash basedfile system. To provide high speedaccess for operations which must accessall records in a database,fast scan exploits the locality of data by using primarily local memory references. Relational projection can be viewed as composed of two sub-tasks,scanning the input relation and creating the ‘result relation. In the process of creating the output relation any duplicate records created as a result of the projection must be removed. The duplicate elimination phaseof projection is usually the time consuming part of the projection operation. Distributed linear hashing is implemented on the BBN Butterfly. Fast scan has been added to support data- basewide operations. We have implementedparallel pro- jection with duplicate removal as an example application using this file system. The performance of the hash file systemand parallel projection is shown. 1.1. Previous Work Several hashing methods [Ghos 86, Wied87] for dynamic files have been proposed since the mid 1970’s. including extendible hashingm791, linear hashing -1, and dynamic hashing [Lafs781. Complete reor- ganization of the data file is avoided in these techniques by allowing the directories to adjust to the records of the overflowing buckets. These hashing methods reduce the searchtime by minimizing the number of disk accesses. Linear hashing as a search structure for databases was developed by Litwin [Litw 801. A solution for con- current linear hashing was proposed by C. S. Ellis lElli 871. Concurrent linear hashing adds a locking protocol and extends the data structuresto enhance concurrency of Proceedings of the 16th VLDB Confcrcncc Brisbane. Australia 1990 674