An Energy-Efficient FDCT/IDCT Configurable IP Core for Mobile Multimedia Platforms Vinícius S. Livramento, Bruno G. Moraes, Brunno A. Machado, José Luís Güntzel Embedded Computing Lab - Department of Informatics and Statistics/PPGCC Federal University of Santa Catarina Florianópolis, Brazil {vini, brunogm, guntzel}@inf.ufsc.br, brunno@das.ufsc.br ABSTRACT The development of mobile multimedia devices follows the platform-based design methodology in which IP cores are the building blocks. In the context of mobile devices there is a concern of battery lifetime which leads to the need of energy- efficient IP cores. This paper presents an energy-efficient FDCT/IDCT configurable IP core. Synthesis for 90 nm resulted in 50 MHz as maximum frequency and 1.66 mW as total power, achieving a throughput of 188.2 Mpixels/s, which is enough to process two HDTV@1080p videos in real time. The IP core architecture is based on Massimino's algorithm, which was chosen for its accuracy and parallelism. The exploration of its parallelism resulted in a fully-combinational 1-D FDCT/IDCT configurable datapath. In addition, the IP core is IEEE-1180 compliant. Comparisons with related work, in terms of energy efficiency (mJ/Mpixel), revealed that our architecture is at least 64% more efficient than other DCT architectures. Categories and Subject Descriptors D.3.3 [Integrated Circuits]: Types and Design Styles – algorithms implemented in hardware. General Terms Design. Keywords Discrete Cosine Transform (DCT), VLSI architecture, Energy efficiency. 1. INTRODUCTION Nowadays, mobile multimedia devices are driving the consumer electronics market. Most applications that may run in such devices are concerned with image and video. Image and video coding and decoding algorithms make intensive use of several types of transforms being the forward and inverse Discrete Cosine Transform (FDCT/IDCT) [1] widely used by image and video standards, such as JPEG [2] and MPEG-1/2/4 [3]. In order to cope with the ever increasing system complexity while satisfying the tight time-to-market, the development of integrated systems for mobile multimedia devices follow the platform-based design methodology [4], whose building blocks are Intellectual Property (IP) cores. Due to their reusability, IP cores allow platforms to address different application domains. In this sense, configurable IPs, such as FDCT/IDCT, provide higher reusability thanks to the offered flexibility. Commercially available mobile multimedia platforms, such as Texas Instrument's OMAP [5] and Qualcomm's Snapdragon [6], are mainly intended for a niche of applications that requires energy efficiency, such as battery- operated devices. Moreover, these mobile platforms rely on energy-efficient IP cores to optimally perform specific tasks (e.g. DCT, FFT, etc) in order to meet the required system´s energy budget. In the design of energy-efficient IP cores the objective is to minimize the energy consumption while satisfying the performance requirement. Therefore, the prime target is to optimize the "energy per operation", rather than optimizing the performance or the total energy consumption solely. Current generation of mobile multimedia devices code and decode color video up to a resolution of 1080p (1920 x 1080 pixels) at a frequency of 30 fps (frames per second) in YCbCr [7] format with 4:2:0 sub-sampling. Under such format, a single frame in MPEG- 2, for example, is represented by luma components (Y) and chroma components (Cb, Cr), each one organized as 8x8 8-bit pixel matrices. To deal with such format in real time, a minimum throughput of 93.3 Mpixels/s (Mega pixels per second) is required. Obviously, such throughput allows for coding or decoding any other video or still image format having lower resolution and/or higher sub-sampling. Among the target applications of mobile multimedia platforms, the mobile internet devices (MIDs) already correspond to a significant part of the market. A typical MID may spend most of time decoding images for browsing, which shortens the battery lifetime. Therefore, in MIDs energy can be saved by using energy-efficient IP cores to offload the critical operations. In this context, the estimated 93.3 Mpixels/s is enough to allow browsing operations. This work presents an energy-efficient FDCT/IDCT configurable IP core. The IP architecture is based on an algorithm that combines high accuracy and parallelism. Our main contributions come from: 1) the algorithmic decisions that save resources preserving an acceptable accuracy; 2) the architectural decisions comprising a single fully- combinational 1-D FDCT/IDCT configurable block and a transpose buffer with simultaneous read/write capabilities. These decisions are the key to design a VLSI architecture that achieves the previously estimated throughput with minimum energy consumption. To the authors’ best knowledge, no other FDCT/IDCT configurable VLSI implementation using such algorithm is found in the literature. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SBCCI'11, August 30–September 2, 2011, João Pessoa, Brazil. Copyright 2011 ACM 978-1-4503-0828-1/11/08...$10.00. 149