IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 3, MARCH 2009 323
On Rate-Distortion Modeling and Extraction of
H.264/SVC Fine-Granular Scalable Video
Jun Sun, Wen Gao, Fellow, IEEE, Debin Zhao, and Weiping Li, Fellow, IEEE
Abstract—Fine-granular scalable (FGS) technologies in H.264/
AVC-based scalable video coding (SVC) provide a flexible founda-
tion to accommodate different network capacities. To support effi-
cient quality extraction, it is important to obtain the rate-distortion
(R-D) or Distortion-Rate (D-R) function of each individual picture
or a group of pictures (GOP). In this paper, firstly, the R-D func-
tion of SVC FGS pictures is analyzed with generalized Gaussian
model and the D-R curve is proved to be a concave function overall.
Considering the current sub-bitplane technology, the D-R function
is revisited and inferred to be linear under MSE criterion within
an FGS level, which also explains why the observed D-R curve
with PSNR criterion is a piece-wise convex function. Secondly, the
drift issue of SVC is analyzed, and a simple and effective distortion
model is proposed to estimate the reconstruction distortion with
drift error. Thirdly, with the above analysis and models, a virtual
GOP concept is introduced, and a new priority setting algorithm is
designed to achieve the optimal R-D performance in a virtual GOP.
The D-R slope of each FGS packet and the D-R function of each
virtual GOP are also obtained during the process. Finally, the D-R
slopes of FGS levels are used in quality layer assignment to achieve
equivalent coding efficiency to the SVC test model but with signifi-
cantly reduced complexity. The D-R functions of virtual GOPs are
utilized to design a practical method for smooth quality reconstruc-
tion. Compared to the prior methods, the smoothed video quality
is improved not only objectively but also subjectively.
Index Terms—Drift propagation, fine-granular scalable (FGS),
rate-distortion (R-D) theory, scalable video coding (SVC).
I. INTRODUCTION
A. Internet Video Streaming and FGS Video Coding
T
HE Internet is experiencing explosive growth of video
streaming. Since the Internet is a shared environment, the
available bandwidth of video streaming typically fluctuates over
a broad range [1]. Small time-scale bandwidth fluctuations can
Manuscript received July 16, 2007; revised November 10, 2007 and February
19, 2008. First published February 13, 2009; current version published April
01, 2009. This work was supported by National Key Technology R&D Program
under Contract 2006BAH02A10 and 60833013, and by National Science Foun-
dation of China and Microsoft Research Asia under Contract 60736043. This
paper was recommended by Associate Editor T. Wiegand.
J. Sun and W. Gao are with Inst. of comp. sci. & tech. and Inst. of digital media
respectively, Peking UnivInstitute of Computer Science and Technology, and
Institute of Digital Media, Peking University, Beijing 100871, China (e-mail:
jsun@pku.edu.cn; wgao@pku.edu.cn)
D. Zhao is with the Department of Computer Science, Harbin Institute of
Technology, Harbin 150001, China (e-mail: dbzhao@jdl.ac.cn).
W. Li is with Amity Systems, Inc., Santa Clara, CA 95054 USA (e-mail:
wli@amity-systems.com).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCSVT.2009.2013494
be addressed by maintaining a receiver buffer, where a few video
frames could be downloaded before they are decoded and dis-
played. However, it’s difficult to accommodate large time-scale
bandwidth fluctuations for the constraints of play-back delay
and receiver buffer size. Typically, large time-scale bandwidth
fluctuations can be accommodated by using a fine-granular scal-
able (FGS) video, where a server can perfectly match the video
rate with the available network bandwidth.
The fine-granular scalable coding of MPEG-4 visual [2]
is achieved by bit-plane coding of DCT coefficients in the
enhancement layer (EL). The new scalable video coding (SVC)
[3] is a scalable amendment of H.264/AVC and is almost fin-
ished now. By reusing the key features of H.264/AVC [4], SVC
significantly improves the efficiency of scalable coding, which
includes three “quality scalability modes”: (a) coarse-grain
quality scalable coding (CGS), (b) medium-grain quality scal-
able coding (MGS), and (c) FGS coding. The FGS mode is
realized through sub-bitplane-based progressive refinement of
EL. Note that the FGS mode has been removed from the final
SVC amendment, and a phase-2 SVC project is started, which
may include FGS coding [5].
Typically, regardless of the techniques that are used to en-
code FGS refinement signals, the prediction loop of FGS coding
should be carefully designed since it determines the trade-off
between coding efficiency and drift in the scalable EL [6]. The
drift is defined here as the encoder-decoder mismatch of pre-
diction reference pictures. For the MPEG-4 FGS coding, the
prediction loop only utilizes the base layer reconstruction, and
thus any truncation of FGS EL has no impact on the motion
compensation. That is, no drift distortion is introduced in the
MPEG-4 FGS coding. However, since the EL is not employed
for encoding the following pictures, the prediction structure has
a significant loss of coding efficiency. For the SVC FGS coding,
except for the key pictures of the coarsest temporal layer, the
highest available quality is employed for motion prediction. The
key pictures of the coarsest temporal layer can use the base layer
reconstruction for motion prediction to control the propagation
of prediction drift. Since the gap between SVC FGS scheme and
single-layer coding is quite small, the scheme is of great interest
in the realm of research.
To best utilize the SVC FGS video, a bit-stream extraction
(rate allocation) algorithm should be employed to transfer the
target bit rate into the rate assigned to each FGS picture. Typi-
cally, there are two optimization goals. The first goal is the op-
timal extraction in the rate distortion sense, which minimizes
the average distortion subject to the rate constraint. The La-
grange multiplier technique and dynamic programming are the
most common approaches to find the solution. The second goal
is smooth quality extraction, which aims to achieve constant
1051-8215/$25.00 © 2009 IEEE
转载
http://www.paper.edu.cn
中国科技论文在线