A design ﬂow for speeding-up dsp applications in heterogeneous reconﬁgurable systems Michalis D. Galanis * , Athanassios Milidonis, Athanassios P. Kakarountas, Costas E. Goutis VLSI Design Laboratory, Electrical and Computer Engineering Department, University of Patras, 26500 Patras, Greece Received 10 January 2005; received in revised form 2 August 2005; accepted 16 September 2005 Available online 9 December 2005 Abstract In this paper, we propose a method for speeding-up Digital Signal Processing applications by partitioning them between the reconﬁgurable hardware blocks of different granularity and mapping critical parts of applications on coarse-grain reconﬁgurable hardware. The reconﬁgurable hardware blocks are embedded in a heterogeneous reconﬁgurable system architecture. The ﬁne-grain part is implemented by an embedded FPGA unit, while for the coarse-grain reconﬁgurable hardware our developed high-performance coarse-grain data-path is used. The design ﬂow mainly consists of three steps; the analysis procedure, the mapping onto coarse-grain blocks, and the mapping onto the ﬁne-grain hardware. In this work, the methodology is validated using ﬁve real-life applications; an OFDM transmitter, a medical imaging technique, a wavelet-based image compressor, a video compression scheme and a JPEG encoder. The experimental results show that the speedup, relative to an all-FPGA solution, ranges from 1.55 to 4.17 for the considered applications. q 2005 Elsevier Ltd. All rights reserved. Keywords: Heterogeneous reconﬁgurable system; Partitioning; Coarse-grain reconﬁgurable hardware; Field programmable gate array; Performance; Design ﬂow 1. Introduction Reconﬁgurable architectures have been a topic of intensive research activities in the past years. Reconﬁgurable fabrics can unify the performance of ASICs and the ﬂexibility offered by the microprocessors [1]. In particular, heterogeneous (or hybrid) granularity reconﬁgurable systems [1–4] offer extra advantages in terms of performance and great ﬂexibility to efﬁciently implement digital signal processing (DSP) appli- cations which are characterized by mixed functionality (data and control). Such heterogeneous architectures usually consist of ﬁne-grain reconﬁgurable units usually implemented in ﬁeld programmable gate array (FPGA) technology, coarse-grain reconﬁgurable units implemented in ASIC technology, microprocessor(s), data and program memories. Due to the special features of the hybrid reconﬁgurable units included in a heterogeneous system platform, certain parts of the application are better suited to be executed on the coarse-grain units and other parts on the ﬁne-grain reconﬁgurable ones. Small bit-width operations can be efﬁciently executed on ﬁne-grain reconﬁgurable hardware, as the granularity of the control logic blocks (CLBs) of modern FPGAs is typically four or ﬁve bits. Tasks of ﬁnite state machine type of functionality are also good candidates to be implemented by the ﬁne-grain reconﬁgurable hardware. The coarse-grain reconﬁgurable blocks are implemented in ASIC technology and they efﬁciently execute word-level or sub word-level operations [2–5]. These blocks can slightly modify their functionality according to the application requirements. The execution time of computational demanding parts of an application by coarse- grain reconﬁgurable units improves performance relative to the execution on ﬁne-grain reconﬁgurable units [1,4,6]. The development of a methodology for partitioning an application in two parts, where the one is executed on the coarse-grain reconﬁgurable hardware and the other one on the ﬁne-grain one, is required for improving performance in heterogeneous reconﬁgurable systems. In this paper, an automated partitioning ﬂow between the ﬁne and coarse-grain reconﬁgurable logic of an embedded heterogeneous platform is introduced. This ﬂow improves the performance by accelerating critical parts, called kernels, on the coarse-grain reconﬁgurable hardware of the heterogeneous system. The main parts of the design ﬂow are the analysis procedure for detecting kernels of the application and Microelectronics Journal 37 (2006) 554–564 www.elsevier.com/locate/mejo 0026-2692/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.mejo.2005.09.032 * Corresponding author. Tel.: C30 2610 997 324; fax: C30 2610 994 798. E-mail address: mgalanis@ee.upatras.gr (M.D. Galanis).