IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications
8-10 September 2003, Lviv, Ukraine
0-7803-8138-6/03/$17.00 ©2003 IEEE 31
Code Compression for the Embedded ARM/THUMB Processor
Xianhong Xu, Simon Jones
Faculty of Engineering and Design, University of Bath, BA2 7AY, UK,
{x.xu, s.r.jones}@bath.ac.uk, http://www.bath.ac.uk/engineering
Abstract: Previous code compression research on
embedded systems was based on typical RISC instruction
code. THUMB from ARM Ltd is a compacted 16-bits
instruction set showing a great code density than its
original 32-bits ARM instruction. Our research shows
that THUMB code is compressible and a further 10-15%
code size reduction on THUMB code can be expected
using our proposed new architecture – Code Compressed
THUMB Processor. In our proposal, Level 2 cache or
additional RAM space is introduced to serve as the
temporary storage for decompressed program blocks. A
software implementation of the architecture is proposed
and we have implemented a software prototype based on
ARM922T processor, which runs on the ARMulator.
Keywords: - ARM, THUMB, Memory, Code,
Compression
1. INTRODUCTION
Memory is usually the main part of the system cost of
an Embedded System. Applying lossless data
compression to the program code [1-5] is an efficient way
to reduce the main memory size, therefore, to reduce the
system cost. The existing research was based on
contemporary RISC architectures, which use 32- or 64-bit
instruction sets. ARM and MIPS have introduced 16-bits
ISAs, namely THUMB[6] in 1995 and MIP16 [7] in
1997, to improve the code density of their original 32-bit
ISAs. Our research started with ARM’s THUMB
instruction set. We analysed the compressibility of
THUMB program code, and exposed that further code
compression over THUMB code is achievable within an
appropriate architecture. We have implemented a
software demonstrator in C proving that the architecture is
practicable.
In the rest of this paper, Section 2 outlines the
previous research works in the code compression area.
Section 3 briefs our study result on the compressibility of
THUMB code. Section 4 details our architecture
approach. Section 5 describes experimental details of the
software implementation of the architecture. Section 6
summarizes our current research and identifies the future
work.
2. RELATED WORK
RISC processors are widely used in embedded
systems. In recent years, the code density problem linked
with RSIC architecture has worsened. Several methods
have been proposed to improve the code density of the
typical RISC instruction sets. One of them is the code
compression approach, that is, storing the program code
in compressed format and decompressing the instructions
before the processor executes them.
To clarify, the Compression Ratio (CR) in this paper
will be calculated using the following equation:
(1)
Size Original
Size Compressed
= CR
Mainly there are two types of code compression
approaches, namely
1) Block Compression: Wolfe and Chanin [1]
applied compression techniques to instruction code by
introducing Compressed Code RSIC Processor (CCRP)
architecture. Within this architecture, original program
blocks with the same size as the cache line length (32
bytes) are compressed at compile time and stored in the
instruction memory. The compressed instruction blocks
will be decompressed and fetched into L1 cache lines
when cache misses occur. In their proposal, static
Huffman algorithm was used to compress the program
blocks. Their experiment showed an overall compression
ratio of 0.73. Lekatsas and Wolf [2] investigated new
compression algorithms to replace the classical Huffman
algorithm in CCRP. IBM CodePack [3] is another block
compression approach based on the 32-bit IBM PowerPC
processor.
2) Reuse of Common Sequences of Instructions:
This type of approaches [4-5] is also called dictionary
code compression architectures. The main idea is based
on the fact that certain sequences of instructions were
repeatedly found in the program image. A dictionary is
used to hold all the common instruction sequences, and
then replacing the common sequences in the program with
short codes, which refer to the dictionary entries. Liao et
al. [5] proposed to use a sub-routine call and Lefurgy et
al. [4] proposed to use a codeword to replace each
dictionary entry. The codeword method respectively
achieved 0.61 and 0.66 compression ratios on PowerPC
and ARM.
Different from these compression approaches, some
chip companies introduced compacted ISAs to improve
the code density of their original RSIC ISAs. ARM
announced a 16-bit ISA, THUMB, to replace the typical
32-bits ARM instruction set [6] which results an average
program size saving of 30%. Also, MIPS launched its 16-
bits ISA: MIP16 [7]. A disadvantage related to these
compacted ISAs is that they increase the instruction
number of the user program, which resulted in slower
timing performance. It was reported in [6] that THUMB
programs run 15%-20% slower than ARM programs.
However, the reported poorer performance was on the
basis of the non-cache presence architecture.
Contemporary high performance system cores are usually
integrated with L1 cache memories. As a THUMB
instruction is half the size of the ARM instruction, a cache
line holds a double number of THUMB instructions
against ARM instructions. A consequence of this is a
higher cache-hit rate, which results a higher performance.