PERCEPTION-BASED HIGH DYNAMIC RANGE VIDEO COMPRESSION WITH OPTIMAL BIT-DEPTH TRANSFORMATION Yang Zhang, Erik Reinhard, David Bull Department of Electrical and Electronic Engineering, University of Bristol, UK Email: Yang.Zhang@bristol.ac.uk, Reinhard@cs.bris.ac.uk, Dave.Bull@bristol.ac.uk ABSTRACT High Dynamic Range (HDR) technology is able to offer high levels of immersivity with a dynamic range comparable to the Human Visual System (HVS). A primary drawback of HDR is that its memory and bandwidth requirements are significantly higher than for conventional video. The challenge is thus to develop means for efficiently compressing the video to a man- ageable bitrate without compromising perceptual quality. In this paper, we propose an HDR compression method based on an optimized bit-depth transformation, and HVS model based wavelet transform denoising. Experimental results indicate that the proposed method outperforms previous approaches and operates in accordance with characteristic of the HVS, tested objectively using a Visible Difference Predictor (VDP). Index Terms— High Dynamic Range, Human Visual System, Wavelet Transform, Bit-Depth Transform, Video Coding 1. INTRODUCTION HDR video overcomes the dynamic range limitations of tra- ditional imaging by performing operations at high bit-depth with much higher precision. However, an uncompressed HDR video sequence demands very large storage space and occupies substantially more transmission bandwidth than Standard Dynamic Range (SDR) video. Consequently, there is an urgent need for an efficient HDR video compression algorithm which produces manageable bitrates without com- promising perceptual quality. For SDR imaging, a 24-bit per pixel encoding format achieves around 2 orders of dynamic range. In contrast for real-world luminance levels, the human visual system can adapt from scotopic (10 -5 - 10 cd/m 2 ) to photopic (10 - 10 6 cd/m 2 ) conditions [1] [2] . HDR proto- type displays are now available that are capable of a contrast ratio of 1,000,000:1 with a peak luminance of 4000 cd/m 2 . State of the art HDR imaging methods can however cover a dynamic range from extremely dark (10 -6 cd/m 2 ) to bright sunshine (10 8 cd/m 2 ) by using higher bit-depths for the luminance channel. Most popular HDR image formats there- fore employ floating point representations but these are not compatible with reduced bit rate transmission and storage. Larson [3] proposed the LogLuv colour space, which quantizes pixel values using a 16-bit logarithmic channel representation for luminance and two 8-bit CIE chrominance (u’, v’) channels representation for chrominance. All pixel values are represented as integers after the colour space trans- formation, as this supports compression better than floating point values. Furthermore, Motra and Thoma [4] have pre- sented an adaptive LogLuv transformation for HDR which can be tuned to any bit-depth to store luminance with an 8-bit chroma channel. Moreover, the technique acts as a preprocess to a standard encoder, rather than as a replacement encoder [5]. The advantage of this colour space is that it spans a huge dynamic range, similar to the adaptation range of the HVS. However, round-off quantization noise is produced by the logarithmic colour space transformation. In this paper, we consider optimized bit-depth quantiza- tion which based on an optimized Lloyd-Max algorithm [6] [7] in order to minimize the transformation noise due to pre- diction. Additionally, a HVS model based Discrete Wavelet Transform (DWT) is employed to denoise the invisible high frequency noise due to transformation. The Contrast Sensitiv- ity Function (CSF) weighting has been applied in the wavelet sub-band domain. Significant improvements are reported in terms of rate-distortion performance and Visible Difference Predictor (VDP) measures. 2. LLOYD-MAX BASED BIT-DEPTH TRANSFORMATION Motra and Thoma proposed an adaptive-LogLuv transform [4] , which can be used in an existing video encoder such as H.264. Their approach can represent the luminance chan- nel at any specified bit-depth and uses 8 bits for the chromi- nance channels. The shortcoming of this algorithm is that the logarithmic colour space transformation generates quantiza- tion noise which can impact on visual quality. In the LogLuv colour space [3] , HDR images require 16 bits for the lumi- nance channel, whereas existing video codecs are normally limited to 14 bits per channel.