PERCEPTUAL QUALITY OF VIDEO WITH QUANTIZATION VARIATION : A SUBJECTIVE STUDY AND ANALYTICAL MODELING Yen-Fu Ou, Huiqi Zeng, Yao Wang Department of Electrical and Computer Engineering Polytechnic Institute of NYU, Brooklyn, NY 11201, U.S.A Email: {you01, hzeng01}@students.poly.edu, yao@poly.edu ABSTRACT This work investigates the impact of temporal variation of quantiza- tion stepsize (QS) on perceptual video quality. Among many dimen- sions of QS variation, as a ﬁrst step we focus on videos in which two QS’s, alternate over ﬁxed intervals. We present subjective test re- sults, and analyze the inﬂuence of several factors (including the QS difference, QS ratio, changing intervals, and video content). Accord- ing the observation and data analysis, we propose analytical models that relate the perceived quality with the two QS’s. Such quality as- sessment and modeling are essential in making video adaptation de- cisions when delivering video over dynamically changing wireless links. Index Terms— perceptual video quality, temporal variation, frame rate, QS, jitter, quality metrics. 1. INTRODUCTION In wireless video streaming, the sustainable bandwidth of a wireless link often ﬂuctuates in time, calling for adaptation of video coding parameters so that the video rate is below the sustainable bandwidth. In practical applications, a video coder can vary the frame size (FS), frame rate (FR), and/or quantization stepsize (QS), to control the video rate. We will refer to the frame size, frame rate, and QS col- lectively as the spatial, temporal and amplitude resolution (STAR). One naive approach would be to ﬁnd the STAR that optimizes the perceptual quality over each short time duration based on the instan- taneous bandwidth. Alternatively, one may pre-code the video into a scalable streams that have multiple spatial, temporal and amplitude layers, and deliver layers corresponding to the optimal STAR. In ei- ther case, the resulting video may have rapidly ﬂuctuating STAR, which may be annoying to the viewer. It is important to understand how does the variation of the STAR, individually and collectively, affect the perceived quality. Such understanding would enable us to impose proper constraints on the variation of the STAR, when adapt- ing the STAR based on the time-varying bandwidth. Take for example a hypothetical case where the available band- width alternates between R l and R h , and the QS that can lead to the best perceived quality for constant rate video at R l and R h are q h and q l , respectively. In this situation, is it better to code the video with alternating QS’s of q l and q h , or would it better to stay at q h ? More generally, one may want to vary not only the QS, but also FR and FS to meet the instantaneous rate constraints. This work is supported in part by the National Science Foundation under Grant No. 0933985. We have conducted a subjective test on FR variation when QS and spatial resolution are ﬁxed [1]. In this work, we explore the impact of QS variation while keeping the FR and FS ﬁxed. Among the many possible temporal variation pattrns, we consider the the simple case where the QS alternates between q l and q h , with each QS staying over a constant interval Fz. We conducted subjective tests where viewers are asked to rate the quality of video with varying q l , q h and Fz. We study the effect of q h , q l as well as Fz on the perceived quality. We include a variety of videos, to further assess the inﬂuence of the video content. This study directly addresses the questions we raised for the hypothetical example given earlier. But it also shed lights for more complicated cases where the QS may vary among more than two levels and the variation may not follow a periodic pattern. This paper is organized as follows. Section 2 describes our subjective test conﬁgurations. Section 3 presents the subjective test results, the observations, and the proposed models that relate the perceived quality with the QS variations. Section 5 concludes the paper. Table 1. Testing conﬁguration for QS variation FR Fz QS b QS v 30 1/2/3 sec 16 16/25/40/64/102 40 25/40/64/102 3 sec 102 25/64/102 2. SUBJECTIVE QUALITY ASSESSMENT 2.1. Testing Material Our experiment is conducted using ﬁve video source sequences, Akiyo, Foreman, Football, Ice, Waterfall, all in CIF (352 × 288) resolution and at frame rate 30 fps, which are chosen from JVT (Joint Video Team) test sequence pool [2]. All these sequences are coded using JVT scalable video model (JSVM912) [3]. For each sequence, one bitstream is generated with ﬁve temporal layers, with corresponding FR of 1.875, 3.75, 7.5, 15, and 30Hz , and each temporal layer in turn has ﬁve quality layers created with QP equal to 28, 32, 36, 40 and 44 (with corresponding to QS = 16, 25, 40, 64, 102), respectively, using the coarse grain scalability (CGS) without QP cascading. For the study reported here, the test videos are ob- tained by decoding all temporal layers (i.e. FR= 30 Hz) but different number of quality layers, corresponding to the desired QP variation. In Tab. 1, QS b indicates the beginning QS while QS v denotes the deviated QS, which could be either higher or lower than QS b . There are a total of 130 processed video sequences (PVS’s).