IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications 8-10 September 2003, Lviv, Ukraine 0-7803-8138-6/03/$17.00 ©2003 IEEE 31 Code Compression for the Embedded ARM/THUMB Processor Xianhong Xu, Simon Jones Faculty of Engineering and Design, University of Bath, BA2 7AY, UK, {x.xu, s.r.jones}@bath.ac.uk, http://www.bath.ac.uk/engineering Abstract: Previous code compression research on embedded systems was based on typical RISC instruction code. THUMB from ARM Ltd is a compacted 16-bits instruction set showing a great code density than its original 32-bits ARM instruction. Our research shows that THUMB code is compressible and a further 10-15% code size reduction on THUMB code can be expected using our proposed new architecture – Code Compressed THUMB Processor. In our proposal, Level 2 cache or additional RAM space is introduced to serve as the temporary storage for decompressed program blocks. A software implementation of the architecture is proposed and we have implemented a software prototype based on ARM922T processor, which runs on the ARMulator. Keywords: - ARM, THUMB, Memory, Code, Compression 1. INTRODUCTION Memory is usually the main part of the system cost of an Embedded System. Applying lossless data compression to the program code [1-5] is an efficient way to reduce the main memory size, therefore, to reduce the system cost. The existing research was based on contemporary RISC architectures, which use 32- or 64-bit instruction sets. ARM and MIPS have introduced 16-bits ISAs, namely THUMB[6] in 1995 and MIP16 [7] in 1997, to improve the code density of their original 32-bit ISAs. Our research started with ARM’s THUMB instruction set. We analysed the compressibility of THUMB program code, and exposed that further code compression over THUMB code is achievable within an appropriate architecture. We have implemented a software demonstrator in C proving that the architecture is practicable. In the rest of this paper, Section 2 outlines the previous research works in the code compression area. Section 3 briefs our study result on the compressibility of THUMB code. Section 4 details our architecture approach. Section 5 describes experimental details of the software implementation of the architecture. Section 6 summarizes our current research and identifies the future work. 2. RELATED WORK RISC processors are widely used in embedded systems. In recent years, the code density problem linked with RSIC architecture has worsened. Several methods have been proposed to improve the code density of the typical RISC instruction sets. One of them is the code compression approach, that is, storing the program code in compressed format and decompressing the instructions before the processor executes them. To clarify, the Compression Ratio (CR) in this paper will be calculated using the following equation: (1) Size Original Size Compressed = CR Mainly there are two types of code compression approaches, namely 1) Block Compression: Wolfe and Chanin [1] applied compression techniques to instruction code by introducing Compressed Code RSIC Processor (CCRP) architecture. Within this architecture, original program blocks with the same size as the cache line length (32 bytes) are compressed at compile time and stored in the instruction memory. The compressed instruction blocks will be decompressed and fetched into L1 cache lines when cache misses occur. In their proposal, static Huffman algorithm was used to compress the program blocks. Their experiment showed an overall compression ratio of 0.73. Lekatsas and Wolf [2] investigated new compression algorithms to replace the classical Huffman algorithm in CCRP. IBM CodePack [3] is another block compression approach based on the 32-bit IBM PowerPC processor. 2) Reuse of Common Sequences of Instructions: This type of approaches [4-5] is also called dictionary code compression architectures. The main idea is based on the fact that certain sequences of instructions were repeatedly found in the program image. A dictionary is used to hold all the common instruction sequences, and then replacing the common sequences in the program with short codes, which refer to the dictionary entries. Liao et al. [5] proposed to use a sub-routine call and Lefurgy et al. [4] proposed to use a codeword to replace each dictionary entry. The codeword method respectively achieved 0.61 and 0.66 compression ratios on PowerPC and ARM. Different from these compression approaches, some chip companies introduced compacted ISAs to improve the code density of their original RSIC ISAs. ARM announced a 16-bit ISA, THUMB, to replace the typical 32-bits ARM instruction set [6] which results an average program size saving of 30%. Also, MIPS launched its 16- bits ISA: MIP16 [7]. A disadvantage related to these compacted ISAs is that they increase the instruction number of the user program, which resulted in slower timing performance. It was reported in [6] that THUMB programs run 15%-20% slower than ARM programs. However, the reported poorer performance was on the basis of the non-cache presence architecture. Contemporary high performance system cores are usually integrated with L1 cache memories. As a THUMB instruction is half the size of the ARM instruction, a cache line holds a double number of THUMB instructions against ARM instructions. A consequence of this is a higher cache-hit rate, which results a higher performance.