CLooGVHDL and JCCI Harald Devos, Wim Meeus, Dirk Stroobandt Harald.Devos@UGent.be ELIS-PARIS – Ghent University – Belgium http://www.elis.ugent.be/ Abstract CLooGVHDL and JCCI offer an extendible C-to-VHDL framework to develop high-level synthesis techniques for data-intensive applications on heterogeneous memory systems. 1. Introduction Multimedia applications are an example of applications that are not only computation-intensive but also data-intensive, which means a large amount of memory is needed. To implement data-intensive applications on FPGAs (Field Programmable Gate Arrays) off-chip memory is needed, which is slower (bandwidth and latency) than on-chip memory and is a potential bottleneck. A memory hierarchy should be constructed to decrease the number of off-chip transactions by reusing data stored in on-chip buffers. Therefore, the different accesses to a data element should be close together in time, i.e. exhibit a good temporal locality. Loop transformations are a means to improve the data locality by changing the execution order of computations and data accesses. This technique is commonly used for software optimizations, in particular optimization of the cache behavior. Current high-level synthesis environments for hardware design lack support to implement data-intensive applications on heterogeneous memory systems. They focus rather on parallelism than on locality. Loop transformations not only influence the data transfers but also the control complexity of an implementation. The impact on the hardware performance can typically only be quantified after refinement to a synthesizable level. This hinders an exploration of the loop transformation space. Therefore, it would be beneficial to integrate loop transformations in high-level synthesis tools. 2. Tool Flow First, the input C code is translated 1 into an abstract syntax tree (AST) and split into statement definitions and a polyhedral representation of the iteration domains and control structure (Fig. 2). In this polyhedral model a sequence of loop transformations can easily be applied. Only after the last transformation, the polyhedral representation is transformed back into code. With the CLooG (Chunky Loop Generator) code generator [1], C code can be generated. We have written CLooGVHDL, which adds a VHDL generation back-end to CLooG. It generates a loop controller circuit composed of 1 The input file (.c, .macro) parsers were generated with ANTLR v3[5]. communicating automata that drive the hardware implementation of the statements (Fig. 1). Different trade-offs between area and clock speed can be investigated (Fig. 3) [4]. As a test case many variants of an inverse discrete wavelet transform have been generated (e.g., Fig. 4). The results outperform those of the commercial high-level synthesis tool Impulse C and are competitive to those of the Celoxica Handel-C compiler [2,4]. In a first version of our tool, application-specific scripts were needed to translate the software description of the data path (statements) into hardware. We now have JCCI, which reads in C code, creates an intermediate representation of data and control flow and generates synthesizable VHDL for the data path. This tool will serve as a framework to develop optimization and exploration techniques. Figure 1: Architecture of the generated hardware CLooGVHDL .vhd .vhd JCCI transformations loop code generation .cloog scheduling statements flow loop/control AST .c .macro Loop control Statements Data path + Control Figure 2: CLooGVHDL and JCCI tool flow