An Executable Intermediate Representation for Retargetable Compilation and High-Level Code Optimization Rainer Leupers, Oliver Wahlen, Manuel Hohenauer, Tim Kogel Aachen University of Technology (RWTH) Integrated Signal Processing Systems Aachen, Germany Email: leupers@iss.rwth-aachen.de Peter Marwedel University of Dortmund Dept. of Computer Science 12 Dortmund, Germany Email: marwedel@cs.uni-dortmund.de Abstract— Due to fast time-to-market and IP reuse require- ments, an increasing amount of the functionality of embedded HW/SW systems is implemented in software. As a consequence, software programming languages like C play an important role in system specification, design, and validation. Besides many other advantages, the C language offers executable specifications, with clear semantics and high simulation speed. However, virtually any tool operating on C specifications has to convert C sources into some intermediate representation (IR), during which the executability is normally lost. In order to overcome this problem, this paper describes a novel IR format, called IR-C, for the use in C based design tools, which combines the simplicity of three address code with the executability of C. Besides the IR-C format and its generation from ANSI C, we also describe its applications in the areas of validation, retargetable compilation, and source- level code optimization. I. I NTRODUCTION The growing importance of the C programming language and its derivatives (e.g. [1], [2], [3]) in embedded system design implies that a large amount of tools for translating C specifications into other formats are required: For instance, compilers for translating C programs into assembly programs, and C based hardware design tools for mapping C specifica- tions into equivalent HDL specifications. In order to perform such translations, it is very common to use a frontend that, as a first step, translates the original C program into a machine- independent intermediate representation (IR). The most widespread IR format is three address code [4]. This format consists of a sequence of simple statements, each of which references at most three variables: two arguments and one result. The main motivation for using three address code is its simple structure. As compared to an original C program, all complex arithmetic expressions, nested control flow constructs, as well as implicit address arithmetic for array or structure accesses are broken down into sequences of primitive assembly-like assignments and jumps. In turn, this strongly facilitates the implementation of tools for processing C programs, such as IR optimization passes, compiler back- ends, or HDL generators. A three address code IR can also be easily translated into data flow graphs (DFGs) which reflect potential parallelism and which are the usual input format for code generation and scheduling algorithms. For the purpose of hardware synthesis from C, the IR generation can be viewed as a specification refinement that lowers an initially high-level specification in order to get closer to the final implementation, while retaining the original semantics. We will explain later, how this refinement step can be validated. However, the executability of C, which is one of its major advantages, is usually lost after an IR has been generated. Executability means that a C specification can be compiled into a machine program for a host machine which can be executed on the host for validation or simulation purposes. Normally, this is no longer possible with the IR. Even though the notion of three address code is intuitively clear, there is no standard format for a three address code IR, but the detailed implementation is typically tool-specific. The purpose of this paper is to present a new IR format, called IR-C, that retains the executability of the C language, while simultaneously offering the simplicity of three address code. The key idea in our approach is to represent the IR itself in C syntax. This is possible, since the C language allows for an extremely low-level, assembly-like specification of programs. Therefore, IR-C can still be compiled and executed like the corresponding original C code, from which the IR has been generated. Note that the executability of IR-C is not required for all types of applications. For instance, in a C compiler IR-C can be used just like any other three address code format as a file exchange format, without the need to compile it onto a host platform. Applications of the proposed IR format include validation and source-level optimization (which exploit the executability of IR-C) as well as retargetable compilation, which is an en- abling technology for architecture exploration for application specific processors (ASIPs). Even though IR-C is a quite general machine-independent format, it is mostly dedicated to the use in embedded system design tools, due to the following reasons: