IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 3, MARCH 2009 323 On Rate-Distortion Modeling and Extraction of H.264/SVC Fine-Granular Scalable Video Jun Sun, Wen Gao, Fellow, IEEE, Debin Zhao, and Weiping Li, Fellow, IEEE Abstract—Fine-granular scalable (FGS) technologies in H.264/ AVC-based scalable video coding (SVC) provide a flexible founda- tion to accommodate different network capacities. To support effi- cient quality extraction, it is important to obtain the rate-distortion (R-D) or Distortion-Rate (D-R) function of each individual picture or a group of pictures (GOP). In this paper, firstly, the R-D func- tion of SVC FGS pictures is analyzed with generalized Gaussian model and the D-R curve is proved to be a concave function overall. Considering the current sub-bitplane technology, the D-R function is revisited and inferred to be linear under MSE criterion within an FGS level, which also explains why the observed D-R curve with PSNR criterion is a piece-wise convex function. Secondly, the drift issue of SVC is analyzed, and a simple and effective distortion model is proposed to estimate the reconstruction distortion with drift error. Thirdly, with the above analysis and models, a virtual GOP concept is introduced, and a new priority setting algorithm is designed to achieve the optimal R-D performance in a virtual GOP. The D-R slope of each FGS packet and the D-R function of each virtual GOP are also obtained during the process. Finally, the D-R slopes of FGS levels are used in quality layer assignment to achieve equivalent coding efficiency to the SVC test model but with signifi- cantly reduced complexity. The D-R functions of virtual GOPs are utilized to design a practical method for smooth quality reconstruc- tion. Compared to the prior methods, the smoothed video quality is improved not only objectively but also subjectively. Index Terms—Drift propagation, fine-granular scalable (FGS), rate-distortion (R-D) theory, scalable video coding (SVC). I. INTRODUCTION A. Internet Video Streaming and FGS Video Coding T HE Internet is experiencing explosive growth of video streaming. Since the Internet is a shared environment, the available bandwidth of video streaming typically fluctuates over a broad range [1]. Small time-scale bandwidth fluctuations can Manuscript received July 16, 2007; revised November 10, 2007 and February 19, 2008. First published February 13, 2009; current version published April 01, 2009. This work was supported by National Key Technology R&D Program under Contract 2006BAH02A10 and 60833013, and by National Science Foun- dation of China and Microsoft Research Asia under Contract 60736043. This paper was recommended by Associate Editor T. Wiegand. J. Sun and W. Gao are with Inst. of comp. sci. & tech. and Inst. of digital media respectively, Peking UnivInstitute of Computer Science and Technology, and Institute of Digital Media, Peking University, Beijing 100871, China (e-mail: jsun@pku.edu.cn; wgao@pku.edu.cn) D. Zhao is with the Department of Computer Science, Harbin Institute of Technology, Harbin 150001, China (e-mail: dbzhao@jdl.ac.cn). W. Li is with Amity Systems, Inc., Santa Clara, CA 95054 USA (e-mail: wli@amity-systems.com). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2009.2013494 be addressed by maintaining a receiver buffer, where a few video frames could be downloaded before they are decoded and dis- played. However, it’s difficult to accommodate large time-scale bandwidth fluctuations for the constraints of play-back delay and receiver buffer size. Typically, large time-scale bandwidth fluctuations can be accommodated by using a fine-granular scal- able (FGS) video, where a server can perfectly match the video rate with the available network bandwidth. The fine-granular scalable coding of MPEG-4 visual [2] is achieved by bit-plane coding of DCT coefficients in the enhancement layer (EL). The new scalable video coding (SVC) [3] is a scalable amendment of H.264/AVC and is almost fin- ished now. By reusing the key features of H.264/AVC [4], SVC significantly improves the efficiency of scalable coding, which includes three “quality scalability modes”: (a) coarse-grain quality scalable coding (CGS), (b) medium-grain quality scal- able coding (MGS), and (c) FGS coding. The FGS mode is realized through sub-bitplane-based progressive refinement of EL. Note that the FGS mode has been removed from the final SVC amendment, and a phase-2 SVC project is started, which may include FGS coding [5]. Typically, regardless of the techniques that are used to en- code FGS refinement signals, the prediction loop of FGS coding should be carefully designed since it determines the trade-off between coding efficiency and drift in the scalable EL [6]. The drift is defined here as the encoder-decoder mismatch of pre- diction reference pictures. For the MPEG-4 FGS coding, the prediction loop only utilizes the base layer reconstruction, and thus any truncation of FGS EL has no impact on the motion compensation. That is, no drift distortion is introduced in the MPEG-4 FGS coding. However, since the EL is not employed for encoding the following pictures, the prediction structure has a significant loss of coding efficiency. For the SVC FGS coding, except for the key pictures of the coarsest temporal layer, the highest available quality is employed for motion prediction. The key pictures of the coarsest temporal layer can use the base layer reconstruction for motion prediction to control the propagation of prediction drift. Since the gap between SVC FGS scheme and single-layer coding is quite small, the scheme is of great interest in the realm of research. To best utilize the SVC FGS video, a bit-stream extraction (rate allocation) algorithm should be employed to transfer the target bit rate into the rate assigned to each FGS picture. Typi- cally, there are two optimization goals. The first goal is the op- timal extraction in the rate distortion sense, which minimizes the average distortion subject to the rate constraint. The La- grange multiplier technique and dynamic programming are the most common approaches to find the solution. The second goal is smooth quality extraction, which aims to achieve constant 1051-8215/$25.00 © 2009 IEEE 转载 http://www.paper.edu.cn 中国科技论文在线