GPU-Assisted Decoding of Video Samples Represented in the YCoCg-R Color Space Wesley De Neve Ghent University - IBBT Multimedia Lab Sint-Pietersnieuwstraat 41 B-9000 Ghent, Belgium Wesley.DeNeve@- UGent.be Dieter Van Rijsselbergen Ghent University - IBBT Multimedia Lab Sint-Pietersnieuwstraat 41 B-9000 Ghent, Belgium Dieter.VanRijssel- bergen@UGent.be Charles Hollemeersch Splash Damage Ltd. 25 Elmfield Road Bromley Kent BR1 1LZ United Kingdom chollemeersch@- gmail.com ABSTRACT Although pixel shaders were designed for the creation of program- mable rendering effects, they can also be used as generic processing units for vector data. In this paper, attention is paid to an implemen- tation of the YCoCg-R to RGB color space transform, as defined in the H.264/AVC Fidelity Range Extensions, by making use of pixel shaders. Our results show that a significant speedup can be achieved by relying on the processing power of the GPU, relative to the CPU. To be more specific, high definition video (1080p), rep- resented in the YCoCg-R color space, could be decoded to RGB at 30 Hz on a PC with an AMD Athlon XP 2800+ CPU, an AGP bus and an NVIDIA GeForce 6800 graphics card, an effort that could not be realized in real-time by the CPU. Categories and Subject Descriptors I.3.3 [Computing Methodologies]: Computer Graphics—picture/- image generation General Terms Performance Keywords FRExt, GPU, H.264/AVC, pixel shaders, YCoCg, YCoCg-R. 1. INTRODUCTION H.264/Advanced Video Coding (H.264/AVC) is a standardized specification for digital video coding, characterized by a design that targets efficiency, robustness, and usability. The first version of this standard primarily focused on entertainment-quality video, based on an eight bits per sample representation and a 4:2:0 chroma sam- pling format. In July, 2004, a new amendment was added to H.264/- AVC, called Fidelity Range Extensions (FRExt, Amendment 1) [4]. The extensions in question make it possible to address the needs of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’05, November 6–11, 2005, Singapore. Copyright 2005 ACM 1-59593-044-2/05/0011 ...$5.00. more demanding applications, especially in the domain of studio editing and post-processing, content contribution, and content dis- tribution. Among other coding tools, FRExt introduces a new color space called YCoCg (luma, chroma orange, and chroma green), as well as a variant that is known as YCoCg-R. The R refers to its lossless reversible RGB (red, green, and blue) to YCoCg mapping, an important property in the context of lossless video coding and its applications (e.g., medical imaging). The YCoCg-R color space is only available in FRExt’s High 4:4:4 Profile. As shown by Shen et al. [2], a color space conversion is one of the most computationally intensive parts in a typical video de- coder architecture: a CPU load of up to 40% is possible for an ordi- nary color space conversion. Hence, this observation makes it very desirable to handle this step efficiently. Although recent graphics hardware offers more and more support for common color spaces, such as ITU-R Recommendations BT.601 and BT.709 (Interna- tional Telecommunication Union - Radiocommunication Sector), this is often not the case for new or proprietary color spaces because the hardware has built-in coefficients that cannot be changed. In this study, we investigate how the DirectX 9 programmable graph- ics pipeline can be exploited to assist the CPU in decoding YCoCg- R video samples to RGB for visualization purposes. This allows to repurpose already existing consumer graphics hardware for accel- erating new color space conversions. The outline of the paper is as follows. In Section 2, we give a brief overview of the YCoCg color space and some of its variations. Section 3 discusses how shaders can be used in order to accelerate the decoding of YCoCg-R video data. Some experimental results are provided in Section 4 while Section 5 concludes. 2. THE YCoCg(-R) COLOR SPACE As discussed by Malvar et al. [1], the YCoCg color space is not only characterized by a simple set of transformation equations rel- ative to RGB, but also by an improved coding gain relative to both RGB and YCbCr (luma, chroma blue, chroma red). The primary motivation behind the development of this color space is to ad- dress some shortcomings with respect to the different YCbCr color spaces, such as difficult to use floating-point coefficients and round- ing errors [3]. When relying on the YCoCg color space, rounding errors can be eliminated if two additional bits of accuracy are used for representing luma and chroma samples. However, it is even possible to devise a smarter approach that does not require adding precision to the luma samples and that only adds one bit of preci- sion to the chroma samples. This scheme is known as the YCoCg-R color space and can be described by the following equations, rela- tive to RGB and aimed at being executed by integer arithmetic: