1 RATE-DISTORTION MODEL FOR MOTION PREDICTION EFFICIENCY IN SCALABLE WAVELET VIDEO CODING Chia-Yang Tsai 1 and Hsueh-Ming Hang 1,2 1 Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan 2 Dept. of Computer Sci. and Inform. Technology, National Taipei University of Technology, Taipei, Taiwan Email: cytsai.ee94g@nctu.edu.tw, hmhang@mail.nctu.edu.tw ABSTRACT A rate-distortion model for motion prediction efficiency in scalable wavelet video coding is proposed in this paper. The Lagrangian multiplier is widely used to solve the rate- distortion optimization problems in video coding, especially on mode decision and rate-constrained motion estimation. Different from the non-scalable video coding, the scalable wavelet video coding needs to operate under multiple bitrate conditions and it has an open-loop structure. Therefore, the conventional rate-distortion optimization technique is not suitable for the scalable wavelet case. By analyzing the rate- distortion trade-off due to different bits allocated to motion information, we propose a motion prediction gain (MPG) metric to measure motion coding efficiency. Based on the MPG metric, a new cost function for mode decision is thus proposed. Compared with the conventional Lagrangian multiplier optimization method, our experiments show that the new mode decision procedure can generally improve the PSNR performance for, particularly, the combined SNR and temporal scalability. ＊ Index Terms— Scalable wavelet video, motion prediction efficiency, motion prediction gain, MPG 1. INTRODUCTION Over the past few years, multimedia delivery has become an important class of wireless/wired internet applications, for example, mobile video and digital TV broadcasting. To deal with the constraints of transmission bandwidth and receiver capability, the scalable coding technique has been adopted by the recent video codecs. Nowadays there are two major approaches on scalable video coding: the DCT-based and the wavelet-based coding schemes. These two coding schemes share many similar coding concepts, especially in removing the temporal redundancy. The Scalable Video Coding (SVC) extension of the H.264/AVC is a ＊ This work was supported in part by the NSC, Taiwan under Grant 96- 2221-E-009 -063. representative scheme of the DCT-based approach and has been accepted as the ITU/MPEG standards in 2008 [1]. On the other hand, the wavelet-based coding scheme is a relatively new structure and has its potential and advantages [2] as shown during the competition process for standardization. Discrete wavelet transform (DWT) has been successfully applied to still image compression. By exploiting the intersubband or intrasubband correlation, the DWT transformed image signal can be efficiently compressed by the context-based entropy coder, such as EZW [3], SPIHT [4], and EBCOT [5]. Different from the DCT-based JPEG image coding, the multiresolution property of wavelet transform provides a natural way in producing scalable bitstreams. It enables the spatial and SNR scalability features in the well-known JPEG2000 image coding standard [6]. In addition to the spatial decomposition, DWT can also be applied along the temporal axis and thus it decomposes video frames into temporal subband signals; therefore, it provides the temporal scalability for videos. In the past 15 years, the temporal wavelet decomposition is refined by adopting the motion compensated temporal filtering (MCTF) technique. These schemes are proposed and improved by Ohm [7], Hsiang and Woods [8], Secker and Taubman [9], and Xu et al. [10]. MCTF can efficiently decompose video frames along the motion trajectories. After MCTF and spatial 2-D DWT, the original video frames are transformed to spatio- temporal subband signals and compressed by a context- based entropy coder [9], [11]. This interframe wavelet video coding scheme can achieve temporal, spatial and SNR scalability goals simultaneously. Depending on the transform order in the spatio-temporal domain, the scalable wavelet coding methods can be classified into "t+2D" and "2D+t" structures [12]. In this paper, we will focus on the t+2D structure. The rate-distortion (R-D) analysis of a scalable wavelet video coder is very different from that of a DCT-based coder owing to the following two issues: inter-scale hybrid coding and open-loop coding structure. Although the DCT- based video coders, such as MPEG-2 or H.264, also use the