IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 3, MARCH 2009 323 On Rate-Distortion Modeling and Extraction of H.264/SVC Fine-Granular Scalable Video Jun Sun, Wen Gao, Fellow, IEEE, Debin Zhao, and Weiping Li, Fellow, IEEE Abstract—Fine-granular scalable (FGS) technologies in H.264/ AVC-based scalable video coding (SVC) provide a ﬂexible founda- tion to accommodate different network capacities. To support efﬁ- cient quality extraction, it is important to obtain the rate-distortion (R-D) or Distortion-Rate (D-R) function of each individual picture or a group of pictures (GOP). In this paper, ﬁrstly, the R-D func- tion of SVC FGS pictures is analyzed with generalized Gaussian model and the D-R curve is proved to be a concave function overall. Considering the current sub-bitplane technology, the D-R function is revisited and inferred to be linear under MSE criterion within an FGS level, which also explains why the observed D-R curve with PSNR criterion is a piece-wise convex function. Secondly, the drift issue of SVC is analyzed, and a simple and effective distortion model is proposed to estimate the reconstruction distortion with drift error. Thirdly, with the above analysis and models, a virtual GOP concept is introduced, and a new priority setting algorithm is designed to achieve the optimal R-D performance in a virtual GOP. The D-R slope of each FGS packet and the D-R function of each virtual GOP are also obtained during the process. Finally, the D-R slopes of FGS levels are used in quality layer assignment to achieve equivalent coding efﬁciency to the SVC test model but with signiﬁ- cantly reduced complexity. The D-R functions of virtual GOPs are utilized to design a practical method for smooth quality reconstruc- tion. Compared to the prior methods, the smoothed video quality is improved not only objectively but also subjectively. Index Terms—Drift propagation, ﬁne-granular scalable (FGS), rate-distortion (R-D) theory, scalable video coding (SVC). I. INTRODUCTION A. Internet Video Streaming and FGS Video Coding T HE Internet is experiencing explosive growth of video streaming. Since the Internet is a shared environment, the available bandwidth of video streaming typically ﬂuctuates over a broad range [1]. Small time-scale bandwidth ﬂuctuations can Manuscript received July 16, 2007; revised November 10, 2007 and February 19, 2008. First published February 13, 2009; current version published April 01, 2009. This work was supported by National Key Technology R&D Program under Contract 2006BAH02A10 and 60833013, and by National Science Foun- dation of China and Microsoft Research Asia under Contract 60736043. This paper was recommended by Associate Editor T. Wiegand. J. Sun and W. Gao are with Inst. of comp. sci. & tech. and Inst. of digital media respectively, Peking UnivInstitute of Computer Science and Technology, and Institute of Digital Media, Peking University, Beijing 100871, China (e-mail: jsun@pku.edu.cn; wgao@pku.edu.cn) D. Zhao is with the Department of Computer Science, Harbin Institute of Technology, Harbin 150001, China (e-mail: dbzhao@jdl.ac.cn). W. Li is with Amity Systems, Inc., Santa Clara, CA 95054 USA (e-mail: wli@amity-systems.com). Color versions of one or more of the ﬁgures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identiﬁer 10.1109/TCSVT.2009.2013494 be addressed by maintaining a receiver buffer, where a few video frames could be downloaded before they are decoded and dis- played. However, it’s difﬁcult to accommodate large time-scale bandwidth ﬂuctuations for the constraints of play-back delay and receiver buffer size. Typically, large time-scale bandwidth ﬂuctuations can be accommodated by using a ﬁne-granular scal- able (FGS) video, where a server can perfectly match the video rate with the available network bandwidth. The ﬁne-granular scalable coding of MPEG-4 visual [2] is achieved by bit-plane coding of DCT coefﬁcients in the enhancement layer (EL). The new scalable video coding (SVC) [3] is a scalable amendment of H.264/AVC and is almost ﬁn- ished now. By reusing the key features of H.264/AVC [4], SVC signiﬁcantly improves the efﬁciency of scalable coding, which includes three “quality scalability modes”: (a) coarse-grain quality scalable coding (CGS), (b) medium-grain quality scal- able coding (MGS), and (c) FGS coding. The FGS mode is realized through sub-bitplane-based progressive reﬁnement of EL. Note that the FGS mode has been removed from the ﬁnal SVC amendment, and a phase-2 SVC project is started, which may include FGS coding [5]. Typically, regardless of the techniques that are used to en- code FGS reﬁnement signals, the prediction loop of FGS coding should be carefully designed since it determines the trade-off between coding efﬁciency and drift in the scalable EL [6]. The drift is deﬁned here as the encoder-decoder mismatch of pre- diction reference pictures. For the MPEG-4 FGS coding, the prediction loop only utilizes the base layer reconstruction, and thus any truncation of FGS EL has no impact on the motion compensation. That is, no drift distortion is introduced in the MPEG-4 FGS coding. However, since the EL is not employed for encoding the following pictures, the prediction structure has a signiﬁcant loss of coding efﬁciency. For the SVC FGS coding, except for the key pictures of the coarsest temporal layer, the highest available quality is employed for motion prediction. The key pictures of the coarsest temporal layer can use the base layer reconstruction for motion prediction to control the propagation of prediction drift. Since the gap between SVC FGS scheme and single-layer coding is quite small, the scheme is of great interest in the realm of research. To best utilize the SVC FGS video, a bit-stream extraction (rate allocation) algorithm should be employed to transfer the target bit rate into the rate assigned to each FGS picture. Typi- cally, there are two optimization goals. The ﬁrst goal is the op- timal extraction in the rate distortion sense, which minimizes the average distortion subject to the rate constraint. The La- grange multiplier technique and dynamic programming are the most common approaches to ﬁnd the solution. The second goal is smooth quality extraction, which aims to achieve constant 1051-8215/$25.00 © 2009 IEEE 转载 http://www.paper.edu.cn 中国科技论文在线