PREPRINT Proceedings of SPIE Conference on Multimedia Systems and Applications II, SPIE Vol. 3845, 20-22 September 1999, Boston, MA. Simulation of Graded Video Impairment by Weighted Summation: Validation of the Methodology John M. Libert, Charles P. Fenimore, and Peter Roitman National Institute of Standards and Technology * , 100 Bureau Drive, Stop 8114 Gaithersburg, MD 20899-8114 ABSTRACT The investigation examines two methodologies by which to control the impairment level of digital video test materials. Such continuous fine-tuning of video impairments is required for psychophysical measurements of human visual sensitivity to picture impairments induced by MPEG-2 compression. Because the visual sensitivity data will be used to calibrate objective and subjective video quality models and scales, the stimuli must contain realistic representations of actual encoder-induced video impairments. That is, both the visual and objective spatio-temporal response to the stimuli must be similar to the response to impairments induced directly by an encoder. The first method builds a regression model of the Peak Signal To Noise Ratio (PSNR) of the output sequence as a function of the bit rate specification used to encode a given video clip. The experiments find that for any source sequence, a polynomial function can be defined by which to predict the encoder bit rate that will yield a sequence having any targeted PSNR level. In a second method, MPEG-2-processed sequences are linearly combined with their unprocessed video sources. Linear regression is used to relate PSNR to the weighting factors used in combining the source and processed sequences. Then the "synthetically" adjusted impairments are compared to those created via an encoder. Visual comparison is made between corresponding I-, B-, and P-frames of the synthetically generated sequences and those processed by the codec. Also, PSNR comparisons are made between various combinations of source sequence, the MPEG-2 sequence used for mixing, the mixed sequence, and the codec-processed sequence. Both methods are found to support precision adjustment of impairment level adequate for visual threshold measurement. The authors caution that some realism may be lost when using the weighted summation method with highly compression-impaired video. Keywords: video quality, video impairment, MPEG-2, video compression, PSNR, simulated impairment 1. INTRODUCTION Because bandwidth is likely to remain at a premium, it also is likely that digital video will continue to be compressed as much as possible, limited only by the tolerance of the human viewer for a degraded picture. Accordingly, the digital video industry recognizes a need for objective quality metrics which have been calibrated to human subjective quality assessments. The need to support objective computational methods with human visual data has spurred several major research efforts, including those described in [1] and [2]. While both of these projects support improvement of video quality computational models, they address the quality issue at quite different levels of abstraction. The study organized and executed by the Video Quality Experts Group (VQEG), described in [1] uses an approach of the television industry for subjective picture quality assessment. The methods, detailed in [3], generally involve assigning values of a numerical category scale, discrete or continuous, to video sequences based on each viewer's personal opinion of its quality relative to either an explicit or implicit reference. Generally, training is provided in an effort to "calibrate" internal scales of the viewers. But the precise nature of the scale by which each viewer assigns ratings is not observable directly. Also, unknown are the relative importance each viewer gives to each spatio-temporal distortion. Moreover, such an ordering may vary among viewers, or even may vary for a single viewer over the duration of the testing period. However, such procedures have the advantage that they are efficient. They integrate a number of disparate quality elements, spatial and temporal, explicit and implicit, into a single value. Furthermore, the quality rating procedures tend to use test material in which picture distortions and their context are identical to or close to that which the viewers will see on television screens or multimedia monitors. Such stimuli retain an element of realism that may be lacking in simpler, highly-controlled Electricity Division, NIST, Technology Administration, U. S. Department of Commerce. This contribution is from the U.S. Government and is not subject to copyright.