1160 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 10, OCTOBER 2007
Code Compression for VLIW Embedded Systems
Using a Self-Generating Table
Chang Hong Lin, Student Member, IEEE, Yuan Xie, Member, IEEE, and Wayne Wolf, Fellow, IEEE
Abstract—We propose a new class of methods for VLIW code
compression using variable-sized branch blocks with self-gener-
ating tables. Code compression traditionally works on fixed-sized
blocks with its efficiency limited by their small size. A branch
block, a series of instructions between two consecutive possible
branch targets, provides larger blocks for code compression.
We compare three methods for compressing branch blocks:
table-based, Lempel–Ziv–Welch (LZW)-based and selective code
compression. Our approaches are fully adaptive and generate the
coding table on-the-fly during compression and decompression.
When encountering a branch target, the coding table is cleared to
ensure correctness. Decompression requires a simple table lookup
and updates the coding table when necessary. When decoding
sequentially, the table-based method produces 4 bytes per itera-
tion while the LZW-based methods provide 8 bytes peak and 1.82
bytes average decompression bandwidth. Compared to Huffman’s
1 byte and variable-to-fixed (V2F)’s 13-bit peak performance,
our methods have higher decoding bandwidth and a comparable
compression ratio. Parallel decompression could also be applied
to our methods, which is more suitable for VLIW architectures.
Index Terms—Code compression, VLIW architecture.
I. INTRODUCTION
E
MBEDDED systems have become more and more im-
portant in the past decade as almost all electronic devices
contain them. The complexity and performance requirements
for embedded systems grow rapidly as system-on-chip (SoC)
architectures become the trend. Embedded systems are cost and
power sensitive, and their memory systems often consume a
large portion of chip area and system cost. Program size tends
to grow as applications become more and more complex, and
even for the same application, program size grows as RISC, su-
perscalar, or VLIW architectures are used. Code compression is
proposed as a solution to reduce program size and to reduce the
memory usage in embedded systems. It refers to compressing
program codes offline and decompressing them on-the-fly
during execution. The idea was first proposed by Wolfe and
Chanin in the early 90’s [1], and much research has been
done to reduce the code size for RISC machines [2]–[7]. As
instruction level parallelism (ILP) becomes common in modern
system-on-chip architectures, a high-bandwidth instruction
fetch mechanism is required to supply multiple instructions
per cycle. Under these circumstances, reducing code size and
Manuscript received May 3, 2006; revised May 2, 2007.
C. H. Lin is with the Department of Electrical Engineering, Princeton Uni-
versity, Princeton, NJ 08544 USA (e-mail: chlin@princeton.edu).
W. Wolf is with the Department of Electrical Engineering, Princeton Univer-
sity, Princeton, NJ 08544 USA (e-mail: chlin@princeton.edu).
Y. Xie is with the School of Electrical and Computer Engineering, Georgia
Institute of Technology, GA 30332 USA.
Digital Object Identifier 10.1109/TVLSI.2007.904097
providing fast decompression speed are both critical challenges
when applying code compression to VLIW machines.
This paper introduces branch-block-based code compression
methods and evaluates them on benchmarks for Texas Instru-
ments’ TMS320C6x VLIW processors. Our schemes use an
adaptive self-generating table to avoid the overhead of storing
the decoding table and have the advantage of fast decompres-
sion with little overhead, which is suitable for the VLIW archi-
tectures. Code compression methods have to be lossless, other-
wise, the decompressed instructions will not be the same as the
original program. Since a decompression engine is needed to de-
compress code during runtime, the decompression overhead has
to be tolerable. Unlike text compression, compressed programs
have to ensure random accesses, since execution flow may be
altered by branch, jump, or call instructions. The compressed
blocks may not be byte aligned, so additional padding bits are
needed after compressed blocks when bit addressable memory
is not available.
Previous code compression methods use small, equally-sized
blocks as basic compression units; each block can be decom-
pressed independently with or without small amounts of infor-
mation from others. When execution flow changes, decompres-
sion can restart at the new position with or without little penalty.
However, not all instructions are the destination of a jump or
branch and the possible targets are determined once the program
is compiled. We define a branch block as the instructions be-
tween two consecutive possible branch targets and use them as
our basic compression units. A branch block may contain sev-
eral basic blocks in the control flow graph representation. Com-
piler methods can also be used to increase the distance between
branch targets to enlarge the size of branch blocks. Since the
size is much larger than the blocks used in previous work, we
have more freedom in choosing compression algorithms. The
concept of using Lempel–Ziv–Welch (LZW) methods in code
compression first appeared in our previous work [8]. We refined
the definition of branch block and extended the code compres-
sion algorithms in this paper.
This paper is organized as follows. Section II reviews pre-
vious related work. Section III describes the general concept of
our code compression approaches using self-generating tables.
We introduce the table-based and LZW-based code compres-
sion in Sections IV and V, and the selective code compression
in Section VI. Experimental results are presented in Section VII
and, finally, Section VIII concludes the paper.
II. RELATED WORK
Wolfe and Chanin were the first to apply code compression to
embedded systems [1]. Their compressed code RISC processor
(CCRP) uses Huffman coding to compress MIPS programs. A
1063-8210/$25.00 © 2007 IEEE