AbstractThe most recent video coding standard, H.264/AVC, imposes severe computational requirements in comparison with the previous ones. This fact makes necessary in order to overcome real-time constraints to count with efficient implementations in terms of performance and flexibility onto a proper architecture. For this purpose IMEC has developed a generic coarse-grained reconfigurable architecture named ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) adapted to these exigencies. This paper presents some guides about the architectural exploration onto ADRES, which permit to point out to it as a good choice for mapping full multimedia applications, such as H.264/AVC decoders. The main goal of this paper is to present different ways to obtain performance improvements that directly depend of architectural modifications in ADRES, maximizing the performance of a baseline profile H.264/AVC decoder. In this sense, this work demonstrates that it is possible to improve performance results related to different parameters as well as to increase the degree of efficiency of ADRES resources. I. INTRODUCTION An inherent desirable feature of modern multimedia devices is that they should be capable of bearing flexibility and performance requirements at the same time. The flexibility avoids creating new devices each time modifications are added into an application. In this sense, it is important to develop devices that support future improvements and modifications without modifying its structure. On the other hand, performance requirements involve parameters like speed, power consumption and silicon area. Reconfigurable architectures try to meet on a single device a degree of flexibility similar to CPUs while maintaining the ASICs levels of performance. Because reconfigurable processors execute efficiently only specific tasks, coupling them to a (soft) processor is imperative, with the subsequent communication drawback as a result. The ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) architecture solves this problem with an ingenious solution, sharing resources between the reconfigurable array and the soft processor. The power efficiency, flexibility and high performance provided by reconfigurable architectures for the next generation of embedded multimedia devices point out ADRES, together with its compiler, as an ideal architectural choice for this purpose. T. Cervero, S. López and R. Sarmiento are with the Institute for Applied Microelectronics (IUMA) and the Department of Electronic Engineering and Control (DIEA), University of Las Palmas de Gran Canaria, Spain. E- mail: {tcervero, seblopez, roberto}@iuma.ulpgc.es. A. Kanstein is with Freescale, Inc., 134 Avenue due General Eisenhower, Toulouse, France. E-mail: A.Kanstein@freescale.com. J.-Y. Mignolet and B. De Sutter are with the IMEC, Kapeldreef 75, Leuven, Belgium. E-mail: {i,mignolet@imec.be; bjorn.desutter@elis.ugent.be} In addition, ADRES opens a wide range of architectural combinations thanks to its generic template. However, these characteristics generate an enormous design space that makes it difficult to find optimized architectures. Within the multimedia applications, video coding standards have been rapidly improved and developed during the last decade. One of the most recent video standard is the H.264/AVC [1], which incorporates several benefits with respect to its antecessors. This standard gains in flexibility into the coding/decoding process, and as a direct consequence, it gets to reduce the transmission rates a 50% and 35% in comparison with MPEG-2 [2] and MPEG-4 [3], respectively. However, the performance improvement comes with an associated increase in the resulting implementation complexity. Due to this reason, the previous idea of meeting flexibility and performance over the same device is even more critical. In order to cope with these challenging requirements, we believe that the most appropriate device to map an application like the H.264/AVC decoder comes from the coarse grained architectural group [4]. Inside this extensive group of alternatives, IMEC's ADRES architecture shapes as an adequate alternative. At the moment, some architectural explorations [13], [14], [15] have been done related with simple kernels (FFT and IDCT), but no one has studied the impact over the performance of mapping a complete application, such as H.264/AVC baseline video decoder, onto ADRES. This paper presents the results of an architectural exploration study in which we have mapped a baseline profile H.264/AVC decoder onto several ADRES architectures. Its main contribution is to present more insights into the usefulness of different architectural features and parameters. The remainder of this paper is organized as follows. Section II introduces the H.264/AVC decoder structure and its functionality. Section III describes the main characteristics of ADRES and its associated programming tool, the DRESC compiler. Moreover, Section IV explains the architectural exploration done in order to find an appropriate template adequate to the H.264/AVC decoder necessities. Finally, Section V exposes a set of general conclusions about the obtained results and future research lines. II. THE H.264/AVC DECODER As it is shown in Figure 1, the H.264/AVC decoder is separated into different functional blocks, each one with an specific task associated. The input bitstream is loaded into a memory buffer, with the objective of being parsed and decoded by the entropy decoder block. The syntax elements obtained after this Architectural exploration of the H.264/AVC decoder onto a coarse-grain reconfigurable architecture T. Cervero, A. Kanstein, S. López, B. De Sutter, R. Sarmiento and J.-Y. Mignolet