A Hardware Assisted High Performance PHK Memory Manager Wentong Li Saraju P. Mohanty Krishna Kavi Email-ID: wl@cse.unt.edu Email-ID: smohanty@cse.unt.edu Email-ID: kavi@cse.unt.edu Dept. of Computer Science and Engineering, University of North Texas, Denton, TX 76203. Abstract Complex mechanisms for dynamic memory manage- ment and garbage collection are needed in modern imper- ative programming languages. Implementation of memory management functions efﬁciently both in terms of memory usage and execution performance becomes important for programs written in such languages. In this paper, we in- troduce a memory allocator that uses hardware assistance to improve the performance of a existing software allocator (PHK allocator). On average, our design reduces the exe- cution time of memory management functions by 58.9%. 1 Introduction and Our Contribution Signiﬁcant amounts of execution time in modern im- perative languages like C++/JAVA is spent on dynamic memory management. Memory management functions are performed by pure software in current systems. In some applications, the amount of execution time spent on mem- ory management is as much as 42% [1]. Thus, there is a need for implementation of a low cost allocator, which has both good execution performance and memory locality in order to build efﬁcient systems for memory intensive appli- cations. Software allocators search through lists of free mem- ory chunks during allocation, and the search is in the criti- cal path of allocator performance. Hardware allocators can perform parallel search through the lists of available mem- ory. Moreover a hardware allocator can easily hide the exe- cution latency of freeing objects, since freeing can run con- currently with application execution. A hardware allocator can coalesce free chunks of memory, in the background, while the application is not using that portion of the mem- ory. Hardware units can also perform garbage collection in the background. The major disadvantages of a hardware- only allocator are the hardware complexity in implement- ing complex allocators and the lack of ﬂexibility in chang- ing allocation strategies. In this paper, we show a new software/hardware co- design. Our design is based on the PHK [2] allocation method used in the Free-BSD system and Chang’s hard- ware allocator [3]. We aim to balance the hardware com- plexity with performance by using both hardware and soft- ware. To prove our claims, we present a comparison of our design in terms of hardware complexity with a hardware-only allocator and a comparison in terms of per- formance with a software-only allocator. We have pro- totyped the hardware components using FPGA. Our pro- posed hardware-software allocator can ﬁnd important uses in the applications written in programming languages like C++/JAVA where a signiﬁcant amount of time is spent in memory management. The rest of paper is organized as follows. We sum- marize the background and related research in Section 2; present the proposed software-hardware co-design dy- namic allocator and its FPGA prototype in Section 3; com- pare our design with existing hardware only and software only allocators in Section 4; and present our conclusions in Section 5. 2 Background and Related Research Our research deals with memory allocators, and thus we will brieﬂy introduce both available software-only allo- cators and hardware-only allocators in this section. There are two most popular open source software al- locators, Doug Lea [4] used in LINUX system and PHK used in Free-BSD system. Berger et. al., [1] have shown that general purpose allocators such as the one by Doug Lea [4], or PHK [2] work well across a wide range applica- tions. Feng et. al, [5] show that the performance difference between these two general purpose allocators for four most memory allocation intensive benchmarks in SPEC 2000 suite [12] is less than 3%. Thus if one were to implement memory allocators in hardware, one should consider one of these two general-purpose allocators. There are a few hardware allocator designs [3] [6] [7] reported. All of these are based on the buddy system in- vented by Knuth [8]. None of them have found practical or commercial acceptance due to the excessive hardware com- plexity. All these allocators target embedded applications, where only physical addresses are used. This can be very limiting in most general-purpose systems that use virtual addresses. To take advantage of the speed of a hardware-only al- 229