1160 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 10, OCTOBER 2007 Code Compression for VLIW Embedded Systems Using a Self-Generating Table Chang Hong Lin, Student Member, IEEE, Yuan Xie, Member, IEEE, and Wayne Wolf, Fellow, IEEE Abstract—We propose a new class of methods for VLIW code compression using variable-sized branch blocks with self-gener- ating tables. Code compression traditionally works on fixed-sized blocks with its efficiency limited by their small size. A branch block, a series of instructions between two consecutive possible branch targets, provides larger blocks for code compression. We compare three methods for compressing branch blocks: table-based, Lempel–Ziv–Welch (LZW)-based and selective code compression. Our approaches are fully adaptive and generate the coding table on-the-fly during compression and decompression. When encountering a branch target, the coding table is cleared to ensure correctness. Decompression requires a simple table lookup and updates the coding table when necessary. When decoding sequentially, the table-based method produces 4 bytes per itera- tion while the LZW-based methods provide 8 bytes peak and 1.82 bytes average decompression bandwidth. Compared to Huffman’s 1 byte and variable-to-fixed (V2F)’s 13-bit peak performance, our methods have higher decoding bandwidth and a comparable compression ratio. Parallel decompression could also be applied to our methods, which is more suitable for VLIW architectures. Index Terms—Code compression, VLIW architecture. I. INTRODUCTION E MBEDDED systems have become more and more im- portant in the past decade as almost all electronic devices contain them. The complexity and performance requirements for embedded systems grow rapidly as system-on-chip (SoC) architectures become the trend. Embedded systems are cost and power sensitive, and their memory systems often consume a large portion of chip area and system cost. Program size tends to grow as applications become more and more complex, and even for the same application, program size grows as RISC, su- perscalar, or VLIW architectures are used. Code compression is proposed as a solution to reduce program size and to reduce the memory usage in embedded systems. It refers to compressing program codes offline and decompressing them on-the-fly during execution. The idea was first proposed by Wolfe and Chanin in the early 90’s [1], and much research has been done to reduce the code size for RISC machines [2]–[7]. As instruction level parallelism (ILP) becomes common in modern system-on-chip architectures, a high-bandwidth instruction fetch mechanism is required to supply multiple instructions per cycle. Under these circumstances, reducing code size and Manuscript received May 3, 2006; revised May 2, 2007. C. H. Lin is with the Department of Electrical Engineering, Princeton Uni- versity, Princeton, NJ 08544 USA (e-mail: chlin@princeton.edu). W. Wolf is with the Department of Electrical Engineering, Princeton Univer- sity, Princeton, NJ 08544 USA (e-mail: chlin@princeton.edu). Y. Xie is with the School of Electrical and Computer Engineering, Georgia Institute of Technology, GA 30332 USA. Digital Object Identifier 10.1109/TVLSI.2007.904097 providing fast decompression speed are both critical challenges when applying code compression to VLIW machines. This paper introduces branch-block-based code compression methods and evaluates them on benchmarks for Texas Instru- ments’ TMS320C6x VLIW processors. Our schemes use an adaptive self-generating table to avoid the overhead of storing the decoding table and have the advantage of fast decompres- sion with little overhead, which is suitable for the VLIW archi- tectures. Code compression methods have to be lossless, other- wise, the decompressed instructions will not be the same as the original program. Since a decompression engine is needed to de- compress code during runtime, the decompression overhead has to be tolerable. Unlike text compression, compressed programs have to ensure random accesses, since execution flow may be altered by branch, jump, or call instructions. The compressed blocks may not be byte aligned, so additional padding bits are needed after compressed blocks when bit addressable memory is not available. Previous code compression methods use small, equally-sized blocks as basic compression units; each block can be decom- pressed independently with or without small amounts of infor- mation from others. When execution flow changes, decompres- sion can restart at the new position with or without little penalty. However, not all instructions are the destination of a jump or branch and the possible targets are determined once the program is compiled. We define a branch block as the instructions be- tween two consecutive possible branch targets and use them as our basic compression units. A branch block may contain sev- eral basic blocks in the control flow graph representation. Com- piler methods can also be used to increase the distance between branch targets to enlarge the size of branch blocks. Since the size is much larger than the blocks used in previous work, we have more freedom in choosing compression algorithms. The concept of using Lempel–Ziv–Welch (LZW) methods in code compression first appeared in our previous work [8]. We refined the definition of branch block and extended the code compres- sion algorithms in this paper. This paper is organized as follows. Section II reviews pre- vious related work. Section III describes the general concept of our code compression approaches using self-generating tables. We introduce the table-based and LZW-based code compres- sion in Sections IV and V, and the selective code compression in Section VI. Experimental results are presented in Section VII and, finally, Section VIII concludes the paper. II. RELATED WORK Wolfe and Chanin were the first to apply code compression to embedded systems [1]. Their compressed code RISC processor (CCRP) uses Huffman coding to compress MIPS programs. A 1063-8210/$25.00 © 2007 IEEE