A Bitmask-based Code Compression Technique for Embedded Systems Seok-Won Seong Dept. of Computer & Information Sc. & Engg. University of Florida, Gainesville, FL 32611, USA sseong@cise.ufl.edu Prabhat Mishra Dept. of Computer & Information Sc. & Engg. University of Florida, Gainesville, FL 32611, USA prabhat@cise.ufl.edu ABSTRACT Embedded systems are constrained by the available memory. Code compression techniques address this issue by reducing the code size of application programs. Dictionary-based code compression techniques are popular because they offer both good compression ratio and fast decompression scheme. Recently proposed techniques [8, 9] improve standard dictionary-based compression by considering mismatches. This paper makes two important contributions: i) it provides a cost-benefit analysis framework for improving the compression ratio by creating more matching patterns, and ii) it develops an efficient code compres- sion technique using bitmasks to improve the compression ratio without introducing any decompression penalty. To demonstrate the usefulness of our approach we have used applications from various domains and compiled for a wide variety of architectures. Our approach outperforms the existing dictionary-based techniques by an average of 15%, giving a compression ratio of 55% - 65%. 1. INTRODUCTION Memory is one of the key driving factors in embedded system design since a larger memory indicates an increased chip area, more power dissipation, and higher cost. As a result, memory imposes constraints on the size of the application programs. Code compression techniques address the problem by reducing the program size. Figure 1 shows the traditional code compression and decompression flow where the compression is done off-line (prior to execution) and the compressed program is loaded into the memory. The decompression is done during the program execution (online). Fetch and Execute Processor Memory Compressed Code Compression Algorithm Application Program (binary) Decompression Mechanism Figure 1: Traditional Code Compression Methodology The first code compression technique for embedded processors was proposed by Wolfe and Chanin [1]. The idea of using a dictionary to store the frequently occurring instruction sequences has been explored Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICCAD ’06 November 5-9, San Jose, CA Copyright 2006 ACM 1-59593-389-1/06/0011 ...$5.00. by various researchers [2, 12]. Lekatsas and Wolf [6] proposed SAMC, a statistical method for code compression using arithmetic coding and Markov model. There has been a significant amount of research in the area of code compression for VLIW and EPIC processors. The tech- nique proposed by Ishiura and Yamaguchi [10] splits a VLIW instruc- tion into multiple fields and each field is compressed using a dictionary- based scheme. Nam et al. [13] also uses a dictionary-based scheme to compress fixed format VLIW instructions. Xie et al. [14] used Tun- stall coding to perform a variable-to-fixed compression. Lin et al. [3] proposed a LZW-based code compression for VLIW processors using a variable-sized-block method. Dictionary-based code compression techniques are popular because they provide both good compression ratio and fast decompression mech- anism. The basic idea is to take advantage of commonly occurring instruction sequences by using a dictionary. Recently proposed tech- niques [8, 9] improve the dictionary-based compression technique by considering mismatches. The basic idea is to create instruction matches by remembering a few bit positions. However, the efficiency of these techniques are limited by the number of bit changes (hamming dis- tance) used during compression. The cost of storing the information for more bit positions offsets the advantage of generating more repeating instruction sequences. Studies [9] have shown that it is not profitable to consider more than three bit changes when 32-bit vectors are used for compression. Section 2 presents a detailed cost-benefit analysis for creating matching instructions. Compression ratio, widely accepted primary metric for measuring the efficiency of code compression, is defined as, Compression Ratio = Compressed program size Original program size (1) We propose an efficient code compression technique to improve the compression ratio further by aggressively creating more matching se- quences using bitmask patterns. We use the decompression engine be- tween the instruction cache and the processor that increases cache hits and reduces bus bandwidth. Our design of the decompression unit is analogous to the one-cycle decompression hardware proposed by Lekatsas et al. [4] except one additional XOR at the output to handle the use of bitmasks. We have used applications from various domains (Mediabench and MiBench) and compiled them for a wide variety of architectures including TI TMS320C6x, MIPS, and SPARC. Our ex- perimental results demonstrate that our approach outperforms the ex- isting dictionary-based compression techniques by an average of 15% without introducing any additional decompression penalty. The rest of the paper is organized as follows. Section 2 describes our cost-benefit analysis framework for creating more repeating patterns. Section 3 presents our code compression algorithm and decompression mechanism followed by a case study in Section 4. Finally, Section 5 concludes the paper. 2. COST-BENEFIT ANALYSIS We have studied how to match more bit positions without adding sig- nificant information in the compressed code. We have considered 32-bit 251