Color Space Conversion for MPEG decoding on FPGA-augmented TriMedia Processor Mihai Sima Stamatis Vassiliadis Sorin Cotofana Jos T.J. van Eijndhoven Delft University of Technology, Delft, The Netherlands Philips Research Laboratories, Eindhoven, The Netherlands M.Sima@et.tudelft.nl http://ce.et.tudelft.nl/˜mihai Abstract A case study on Color Space Conversion (CSC) for MPEG decoding, carried out on FPGA- augmented TriMedia processor is presented. That is, a transform from color space to color space is addressed. First, weoutline the extension of TriMedia architecture consist- ing of FPGA-based Reconfigurable Functional Units (RFU) and associated instructions. Then we analyse a CSC (RFU–specific) instruction which can process four pixels per call, and propose a scheme to implement the CSC operation on RFU(s). When mapped on an ACEX EP1K100 FPGA, the proposed CSC exhibits a latency of 10 and a recovery of 2 TriMedia@200 MHz cycles, and oc- cupies 57% of the device. By configuring the CSC facility on the RFU(s) at application load-time, color space conversion can be computed on FPGA-augmented TriMedia with 40% speed-up over the standard TriMedia. Enhancing a general purpose processor with a reconfigurable core is a common issue addressed by computer architects [8, 15, 2]. The basic idea of this approach is to exploit both the processor flexibility to achieve medium performance for a large class of applications, and FPGA capability to implement application-specific computations. An instance of such enhanced processor is TriMe- dia+FPGA hybrid [11], on which the user is given the freedom to define and use any computing facility subject to FPGA size and TriMedia organization. Several applications implemented on this hybrid, e.g., Inverse Discrete Cosine Transform [10] and Entropy Decoding [12], proved promis- ing results. In this paper, we address color space conversion which is carried out at the end of the MPEG decoding process. The last stage of the MPEG decoding consists of a color space conversion, which is a linear trans- form from color space to color space. Since this transform exhibits large data and instruction-level parallelisms, it can be implemented on TriMedia with very high efficiency. Ob- taining improvements for a task having a computational pattern which TriMedia has been optimised for, is indeed challenging. In this paper we demonstrate that significant speed-up for -to- color space con- version can be achieved on FPGA-enhanced TriMedia over standard TriMedia. The main idea is to configure a pipelined Color Space Converter (CSC) on FPGA and to unroll the software loop issu- ing an CSC operation such that the penalty associated to firing-up and flushing the CSC pipeline is reduced. In particular, we provide configurable-hardware support for a CSC operation which can IEEE 14th Intl. Conf. on Application-specific Systems, Architectures, and Processors (ASAP 2003), The Hague, The Netherlands, June 24-26, 2003.