IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 20, NO. 4, APRIL 2001 477
Compact and Efficient Code Generation Through
Program Restructuring on Limited Memory
Embedded DSPs
Siddharth Rele, Vipin Jain, Santosh Pande, and J. Ramanujam
Abstract—Many embedded systems such as digital cameras,
digital radios, high-resolution printers, cellular phones, etc., in-
volve a heavy use of signal processing and are thus based on digital
signal processors (DSPs). DSPs such as the TMS320C2x and the
DSP5600x have irregular data paths that typically result due to
application specific needs (such as chaining multiply–accumulate
operations, etc.). Efficient code generation for such embedded DSP
processors is a challenging problem. The stringent requirements
such as tight memory constraints and fast response time result
in the need for a compact and efficient code. In this paper, we
address the problem of generating a compact and efficient code
for embedded DSP processors. Most of the DSP instruction set
architectures (ISAs) feature intrainstruction parallelism (IIP),
enabling individual operations to be executed in parallel by
generating a complex instruction. A reduction in generated code
size and improved performance can be achieved by exploiting
this parallelism present in such ISAs. In this paper, we present
a code restructuring technique to fully exploit this parallelism
through maximal utilization of the complex instructions present
in the instruction set. We formulate this as a maximal benefit
code restructuring problem, which is to derive the arrangement
of statements to maximally exploit IIP without violating data
dependencies. This problem is equivalent to the precedence
constrained Hamiltonian path problem for directed acyclic graphs
and the traveling salesman problem in general, both of which
are NP-hard. In this paper, we present an optimal algorithm to
solve the problem. We have implemented this optimal algorithm
in a compiler targeted to generate code for the TMS320C25
DSP. We tested our framework on a number of benchmarks and
found that the performance of the generated code (measured in
dynamic instruction cycle counts) improves by as much as 9.9%
with an average of 4%. The average code-size reduction over
code compiled without exploiting parallelism is 2.9%. We also
studied the effect of loop unrolling on the available IIP. An on-chip
Manuscript received November 1, 1999; revised June 13, 2000 and December
13, 2000. The work of S. Pande was supported in part by DARPA under Con-
tract ARMY DABT63-97-C-0029 and the National Science Foundation under
Grant CCR-0073512. The work of J. Ramanujam was supported in part by a
National Science Foundation Young Investigator Award CCR-9457768 and by
the National Science Foundation under Grant CCR-0073800. This paper was
recommended by Associate Editor R. Camposano.
S. Rele is with the Compiler Research Laboratory, Department of Electrical
and Computer Engineering and Computer Science, University of Cincinnati,
Cincinnati, OH 45221 USA (e-mail: srele@ececs.uc.edu).
V. Jain was with the Compiler Research Laboratory, Department of Electrical
and Computer Engineering and Computer Science, University of Cincinnati,
Cincinnati, OH 45221 USA. He is now with the the Server Technology Division,
Oracle Corporation, San Jose, CA 95101 USA.
S. Pande was with the University of Cincinnati, Cincinnati, OH 45221 USA.
He is now with the College of Computing, Georgia Institute of Technology,
Atlanta, GA 30318 USA (e-mail: santosh@cc.gatech.edu).
J. Ramanujam is with the Department of Elecrical and Computer Engi-
neering, Louisiana State University, Baton Rouge, LA 70803 USA (e-mail:
jxr@ee.lsu.edu).
Publisher Item Identifier S 0278-0070(01)01939-X.
instruction cache can be effectively utilized by unrolling loops
such that generated code fully occupies the memory. The benefit is
reduction in dynamic instruction count due to the higher number
of complex instructions generated. We found that by unrolling
loop by four to five times to fit available on-chip instruction cache,
the dynamic instruction counts reduce by as much as 9.9%.
Index Terms—Code compaction, complex instructing ISAs,
DSPs.
I. INTRODUCTION
E
MBEDDED processors are widely used in a variety of
applications such as cellular phones, pagers, printers,
copiers, digital cameras, automobiles, flight navigation sys-
tems, etc. Unlike general purpose processors, embedded
processors are designed and optimized for specific (classes
of) applications [13]. Embedded systems are constrained by
limited on-chip program memory [2], real time performance
requirements [18], [27], and low power consumption demands.
The evaluation criteria for embedded processors are different
from those of general purpose processors. The following cri-
teria are typically used while comparing embedded processors
[34].
1) Performance: The cost-performance ratio of embedded
systems is measured in MIPS/dollar. It is one of the most impor-
tant criteria for judging embedded processors due to the desired
real time constraints and low costs of the system in which the
processor is embedded.
2) Code Size Versus Density: The code size targeted toward
complex instruction set computing (CISC) architectures can be
smaller than the one that is targeted toward reduced instruction
set computing (RISC) architectures due to the presence of com-
plex instructions. However, in the absence of aggressive com-
piler optimizations, the code density of CISCs is poor compared
to RISCs. Thus compilers have to play a big role in improving
the code density by performing machine-dependent optimiza-
tions specifically designed for reducing the size of the generated
code through code restructuring transformations.
Traditionally, embedded processors and systems are pro-
grammed using assembly language in order to meet the hard
performance constraints and limited program memory. How-
ever, programming large complex applications in assembly
language is tedious, error-prone, and time-consuming; in
addition, such programs are difficult to maintain. High-level
languages like C and C are replacing assembly language in
embedded programming. Programming in high-level language
0278–0070/01$10.00 © 2001 IEEE