286 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 3, MARCH 2011 Perceptual Quality Assessment of Video Considering Both Frame Rate and Quantization Artifacts Yen-Fu Ou, Zhan Ma, Tao Liu, Member, IEEE, and Yao Wang, Fellow, IEEE Abstract —In this paper, we explore the impact of frame rate and quantization on perceptual quality of a video. We propose to use the product of a spatial quality factor that assesses the quality of decoded frames without considering the frame rate effect and a temporal correction factor, which reduces the quality assigned by the ﬁrst factor according to the actual frame rate. We ﬁnd that the temporal correction factor follows closely an inverted falling exponential function, whereas the quantization effect on the coded frames can be captured accurately by a sigmoid function of the peak signal-to-noise ratio. The proposed model is analytically simple, with each function requiring only a single content-dependent parameter. The proposed overall metric has been validated using both our subjective test scores as well as those reported by others. For all seven data sets examined, our model yields high Pearson correlation (higher than 0.9) with measured mean opinion score (MOS). We further investigate how to predict parameters of our proposed model using content features derived from the original videos. Using predicted parameters from content features, our model still ﬁts with measured MOS with high correlation. Index Terms—Content features, frame rate, scalable video, video quality model. I. Introduction D EVELOPMENT of objective quality metrics that can automatically and accurately measure perceptual video quality is becoming more and more important as video applica- tions become pervasive. Prior work in video quality assessment is mainly concerned with applications where the frame rate of the video is ﬁxed. The objective quality metric compares each pair of corresponding frames in deriving a similarity score or distortion between two videos with the same frame rate. In many emerging applications targeting for heterogeneous users with different display devices and/or different communication links, the same video content may be accessed with varying frame rate, frame size, or quantization [assuming the video is coded into a scalable stream with spatial/temporal/signal- Manuscript received July 10, 2009; revised December 1, 2009 and April 2, 2010; accepted May 10, 2010. Date of publication October 18, 2010; date of current version March 23, 2011. This work was supported by the National Sci- ence Foundation, under Grant 0430145. The work of Y. Wang was supported in part by the Ministry of Education of China as a Yangtze River Lecture Scholar. This paper was recommended by Associate Editor S.-Y. Chien. Y.-F. Ou, Z. Ma, and Y. Wang are with the Polytechnic Institute of New York University, Brooklyn, NY 11201 USA (e-mail: you01@students.poly.edu; zma03@students.poly.edu; yao@poly.edu). T. Liu is with Dialogic Research, Inc., Eatontown, NJ 07724 USA (e-mail: taoliu.bit@gmail.com). Color versions of one or more of the ﬁgures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identiﬁer 10.1109/TCSVT.2010.2087833 to-noise ratio (SNR) scalability]. In applications permitting only very low bit rate video, one often has to determine whether to code an original high frame-rate video at the same frame rate but with signiﬁcant quantization, or to code it at a lower frame rate with less quantization. In all proceeding scenarios as well as many others, it is important being able to objectively quantify the perceptual quality of a video that has been subjected to both quantization and frame rate reduction. There have been several works studying the impact of frame rate artifacts on perceptual video quality. In a recent review of frame rate effect on human perception of video [1], it is found that frame rate around 15 Hz seems to be a threshold of humans’ satisfaction level, but the exact acceptable frame rate varies depending on video content, underlying application, and the viewers. In addition, the authors of [2] proposed that the preferred frame rate decreases as video bandwidth decreases, and two switching bandwidths corresponding to the preferred frame rates were derived. The work in [3] investigated the preferred frame rate for different types of video. In [4], a particular high-motion type of coded video sequences (sports game) was explored. It was found that high spatial quality is more preferable than high frame rate for small screens. However, no speciﬁc quality metric, which can predict the perceived video quality, were derived in these works [1]–[4]. The work in [5]–[7] proposed quality metrics that consider the effect of frame rate. The work in [5] used logarithmic function of the frame rate to model the negative impact of frame rate dropping on perceptual video quality in the absence of compression artifacts. The model was shown to correlate well with subjective ratings for both common intermediate format (CIF) and quarter common intermediate format (QCIF) videos. However, this model requires two content-dependent parameters, which may limit its applicability in practice. The metric proposed in [6] explored the impact of regular and irregular frame drop. The quality of each video scene is deter- mined by weighting and normalizing a logarithm function of temporal ﬂuctuation and the frame dropping severity. Finally, the overall quality of the entire video is the average of the quality indices over all video scene segments. The work in [7] also considered the impact of both regular and irregular frame drops and examines the jerkiness and jitter effects caused by different levels of strength, duration and distribution of the temporal impairment. However, [6] did not provide a single equation, which can predict the perceptual quality of regular 1051-8215/$26.00 c  2010 IEEE