OPTIMIZATION OF SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTION FOR RATE-CONSTRAINED VIDEO CODING AND SCALABLE VIDEO ADAPTATION Hao Hu ‡ , Zhan Ma †‡ , Yao Wang ‡ ‡Department of ECE, Polytechnic Institute of NYU, Brooklyn, NY 11201 †Samsung Telecommunications America, Richardson TX 75082 ABSTRACT This paper considers how to choose the frame size, frame rate, and quantization stepsize to optimize the perceptual quality for a given rate constraint. The proposed solution leverages previously developed quality and rate models that explicitly consider the impact of spatial, temporal, and amplitude res- olution (STAR) on the quality and rate. Using these models we further propose algorithms for ordering the STAR layers to form a rate-quality optimized stream, which can greatly fa- cilitate scalable video adaptation. Index Terms— Rate model, quality model, layer order- ing, scalable video adaptation 1. INTRODUCTION A fundamental and challenging problem in video encoding is, given a target bit rate, how to determine at which spatial res- olution (i.e., frame size [FS]), temporal resolution (i.e., frame rate [FR]), and amplitude resolution (usually controlled by the quantization stepsize (QS) or convertably quantization pa- rameter (QP)), to code the video. One may code the video at a high FR, large FS, but high QS, yielding noticeable cod- ing artifacts in each coded frame. Or one may use a low FR, small FS, but small QS, producing high quality frames. These and other combinations can lead to very different perceptual quality. In traditional rate-control algorithms, the spatial and temporal resolutions are pre-ﬁxed based on some empirical rules, and the encoder varies the QS, to reach a target bit rate. Selection of QS is typically based on models of rate versus QS. When varying the QS alone cannot meet the target bit rate, frame skipping and/or frame size reducing are neces- sary. Ideally, the encoder should choose the spatial, tempo- ral, and amplitude resolution (STAR) that leads to the best perceptual quality, while meeting the target bit rate. In [1], joint decision of QS and frame skip was considered, by using the mean square error (MSE) as a quality measure. How- ever, using MSE to compare videos at different spatial and/or temporal resolutions has inherent problems. A common ap- proach is to interpolate reduced resolution videos and com- pute the MSE at the highest resolution. This often over penal- ize videos at lower resolution, based on our subjective testing of videos at different spatial and temporal resolutions. In our prior work [3], we have developed a quality model that explic- itly considers the impact of STAR on the perceptual quality, derived based on subjective ratings of videos coded at differ- ent STARs. We also developed a rate model that explicitly consider the impact of the STAR on the bit rate [2]. In this paper, we consider how to use both the quality and rate mod- els to optimize STAR under a giver rate constraint. In video streaming, a video may be coded into a scalable stream that can be decoded at different STARs to accommo- date heterogenous bandwidth conditions. Given a particular user’s sustainable rate, either the sender or proxy node needs to extract from the original bitstream certain layers that meets the rate constraint. This problem is generally known as scal- able video adaptation. The challenge is to determine which layers (which corresponds to a particular STAR) to extract, in order to maximize the perceptual quality. Although video encoding and adaptation are quite differ- ent applications, the essence of these problem is to maximize the video quality under the bit rate constraint, i.e., Determine s, t, q to max Q(q, s, t) s.t. R(q, s, t) ≤ R 0 , (1) where Q(q, s, t) and R(q, s, t) represent the perceptual qual- ity and rate at FS s, FR t, and QS q; and R 0 is the bit rate constraint. Both optimal rate control and rate adaptation re- quire accurate rate and perceptual quality prediction at any STAR indicated by (q, s, t). In this paper, we consider how to optimize the STAR for given rate constraints, applicable both for rate control and scalable video adaptation. We further propose an analytical rate-quality model, that can accurately relate the achievable maximum quality under a given rate, by using the optimal STAR. Such an analytical rate-quality model is very help- ful for solving video rate allocation for multiple competing streams within a utility maximization framework. We further consider how to order the SVC layers into a rate-quality op- timized layered stream, so that each additional layer leads to the maximum ratio of the quality gain over the rate increment. Our prior work in [4, 5] have studied similar problems but considering only the optimization of FR and QS. In this pa- per, we extend these studies to consider the adaptation of FS, FR and QS jointly. The proposed layer preordering scheme is