A design flow for speeding-up dsp applications in heterogeneous reconfigurable systems Michalis D. Galanis * , Athanassios Milidonis, Athanassios P. Kakarountas, Costas E. Goutis VLSI Design Laboratory, Electrical and Computer Engineering Department, University of Patras, 26500 Patras, Greece Received 10 January 2005; received in revised form 2 August 2005; accepted 16 September 2005 Available online 9 December 2005 Abstract In this paper, we propose a method for speeding-up Digital Signal Processing applications by partitioning them between the reconfigurable hardware blocks of different granularity and mapping critical parts of applications on coarse-grain reconfigurable hardware. The reconfigurable hardware blocks are embedded in a heterogeneous reconfigurable system architecture. The fine-grain part is implemented by an embedded FPGA unit, while for the coarse-grain reconfigurable hardware our developed high-performance coarse-grain data-path is used. The design flow mainly consists of three steps; the analysis procedure, the mapping onto coarse-grain blocks, and the mapping onto the fine-grain hardware. In this work, the methodology is validated using five real-life applications; an OFDM transmitter, a medical imaging technique, a wavelet-based image compressor, a video compression scheme and a JPEG encoder. The experimental results show that the speedup, relative to an all-FPGA solution, ranges from 1.55 to 4.17 for the considered applications. q 2005 Elsevier Ltd. All rights reserved. Keywords: Heterogeneous reconfigurable system; Partitioning; Coarse-grain reconfigurable hardware; Field programmable gate array; Performance; Design flow 1. Introduction Reconfigurable architectures have been a topic of intensive research activities in the past years. Reconfigurable fabrics can unify the performance of ASICs and the flexibility offered by the microprocessors [1]. In particular, heterogeneous (or hybrid) granularity reconfigurable systems [1–4] offer extra advantages in terms of performance and great flexibility to efficiently implement digital signal processing (DSP) appli- cations which are characterized by mixed functionality (data and control). Such heterogeneous architectures usually consist of fine-grain reconfigurable units usually implemented in field programmable gate array (FPGA) technology, coarse-grain reconfigurable units implemented in ASIC technology, microprocessor(s), data and program memories. Due to the special features of the hybrid reconfigurable units included in a heterogeneous system platform, certain parts of the application are better suited to be executed on the coarse-grain units and other parts on the fine-grain reconfigurable ones. Small bit-width operations can be efficiently executed on fine-grain reconfigurable hardware, as the granularity of the control logic blocks (CLBs) of modern FPGAs is typically four or five bits. Tasks of finite state machine type of functionality are also good candidates to be implemented by the fine-grain reconfigurable hardware. The coarse-grain reconfigurable blocks are implemented in ASIC technology and they efficiently execute word-level or sub word-level operations [2–5]. These blocks can slightly modify their functionality according to the application requirements. The execution time of computational demanding parts of an application by coarse- grain reconfigurable units improves performance relative to the execution on fine-grain reconfigurable units [1,4,6]. The development of a methodology for partitioning an application in two parts, where the one is executed on the coarse-grain reconfigurable hardware and the other one on the fine-grain one, is required for improving performance in heterogeneous reconfigurable systems. In this paper, an automated partitioning flow between the fine and coarse-grain reconfigurable logic of an embedded heterogeneous platform is introduced. This flow improves the performance by accelerating critical parts, called kernels, on the coarse-grain reconfigurable hardware of the heterogeneous system. The main parts of the design flow are the analysis procedure for detecting kernels of the application and Microelectronics Journal 37 (2006) 554–564 www.elsevier.com/locate/mejo 0026-2692/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.mejo.2005.09.032 * Corresponding author. Tel.: C30 2610 997 324; fax: C30 2610 994 798. E-mail address: mgalanis@ee.upatras.gr (M.D. Galanis).