Mapping of nomadic multimedia applications on the ADRES reconfigurable array processor Mladen Berekovic a, * , Andreas Kanstein b , Bingfeng Mei a , Bjorn De Sutter a a IMEC, B-301 Leuven, Belgium b Freescale Semiconductor, 31023 Toulouse Cedex, France article info Article history: Available online 20 February 2009 Keywords: Coarse-grain reconfigurable arrays ADRES DRESC Processor architecture DSP Multimedia MPEG H.264 Reconfigurable computing abstract This paper introduces the mapping of MPEG video decoders on ADRES, IMEC’s new coarse-grain reconfig- urable and fully C-programmable array processor that targets nomadic devices. ADRES is a flexible tem- plate that allows the instantiation of many different processor versions. An XML-based architecture description language allows a designer to easily generate different processor instances with full compiler support by specifying different values for the communication topology, the number and size of local reg- ister files and functional units and supported instruction set. ADRES supports a VLIW-like programming model with a pure VLIW mode for legacy code, and a (coarse-grain reconfigurable) array mode with very high parallelism for the processing of compute intensive loops. We demonstrate the mapping of two video decoders MPEG-2 and AVC, and discuss the performance trade-offs for two critical kernels: IDCT and inte- ger transform. As a result, an ADRES based system can perform AVC decoding in CIF resolution with less then 50 MHz on a 4 Â 4 array processor. Ó 2009 Elsevier B.V. All rights reserved. 1. Introduction A new class of programmable processor architectures is emerg- ing for demanding DSP applications such as video coding: coarse- grained reconfigurable architectures (CGRAs). While many CGRAs were proposed in recent years [4] none of them have yet been widely adopted, partially due to the difficult programming models, and partially due to the vast overuse of resources when compared to other DSP processors. Another typical problem is the difficult inter- facing between the array and the host processor, where the control flow part of the application code is running. These issues are addressed by a novel CGRA called architecture for dynamically reconfigurable embedded systems (ADRES) and by its compiler technology called dynamically reconfigurable embedded system compiler (DRESC) [9]. Firstly, the ADRES architecture tightly couples a very-long instruction word (VLIW) processor and a coarse-grained array by providing two functional views on the same physical resources. The VLIW part offers an easy path for the mapping of complex applications, that is absent in other published CGRA imple- mentations. Furthermore, the array part offers unprecedented loop accelerations. Secondly the DRESC compiler framework assures that applications written in C can be easily mapped onto VLIW and array mode. The sharing of a central registerfile between these two modes, that also serves as a storage for live-in and live-out variables for the loop mode, minimizes communication and mode-switching costs and enables the compiler to seamlessly generate code for both modes, including the data transfer operations. Finally, ADRES is a template instead of a concrete processor architecture. With the retargetable compilation support from DRESC, architectural explo- ration becomes possible to discover better architectures or design domain-specific architectures. We mapped two key video applications, namely MPEG-2 [5,14] and H.264 [6,13] decoding, on ADRES [11]. Firstly, the applications have been compiled for the VLIW-view. Next, the IDCT, which is also used in MPEG-4 [7] and integer transform kernels are acceler- ated on the array part of ADRES. The results for these are discussed in detail, and compared to benchmarks for a state-of-the art VLIW– DSP processor, TI’s TMS320C64Â [1]. This paper is organised as follows. In Section 2 we first present the architecture of the ADRES reconfigurable array processor in Section 2 and the corresponding DRESC compiler in Section 3. Then, in Section 4, we illustrate our application mapping method- ology that is specific for this type of reconfigurable array and apply it for MPEG in Section 5. The mapping results are presented in Sec- tion 6 and compared to other stat-of-the-art processors in Section 7. Section 8 presents hardware implementation results and finally, in Section 9 our conclusions are discussed. 2. The ADRES CGRA The ADRES architecture template, as shown in Fig. 1, consists of an array of basic components, including FUs, register files (RFs) and 0141-9331/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.micpro.2009.02.008 * Corresponding author. E-mail address: berekovic@ida.ing.tu-bs.de (M. Berekovic). Microprocessors and Microsystems 33 (2009) 290–294 Contents lists available at ScienceDirect Microprocessors and Microsystems journal homepage: www.elsevier.com/locate/micpro