VIDEO QUALITY ASSESSMENT BY DECOUPLING ADDITIVE IMPAIRMENTS AND DETAIL LOSSES Songnan Li, Lin Ma, King Ngi Ngan Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong SAR ABSTRACT In this paper, a review on existing methods of extending im- age quality metric to video quality metric is given. It is found that three processing steps are usually involved which include the temporal channel decomposition, temporal masking and error pooling. They are utilized to extend our previously pro- posed image quality metric, which separately evaluates ad- ditive impairments and detail losses, to video quality metric. The resultant algorithm is tested on subjective video database LIVE and shows a good performance in matching subjective ratings. Index Terms— video quality assessment, distortion de- coupling, human visual system, visual masking 1. INTRODUCTION Since the human visual system (HVS) is the ultimate receiver of the video service, subjective viewing test is considered to be the most reliable way to evaluate visual quality. However, subjective viewing test is expensive, and not feasible for on- line manipulations, which makes it impractical for system de- sign, quality monitoring, etc. Therefore, an accurate objec- tive VQA algorithm, or namely video quality metric (VQM), becomes of fundamental importance to future multimedia ap- plications. It is customary to classify VQM into three categories according to the reference availability: full-reference (FR), reduced-reference (RR), and no-reference (NR) metrics. In FR metrics, the reference is fully available and is assumed to have maximum quality. They can be applied in applications where the reference is fully available, such as image/video coding, watermarking etc. RR metrics extract features from the reference video, transmit them to the receiver side to compare against the corresponding features extracted from the distorted video. The design of RR metric mainly targets at quality monitoring. These features should be carefully selected to achieve both effectiveness and efﬁciency, i.e., predicting quality with great accuracy and small overhead for feature representation. NR metrics require no reference, therefore are most broadly applicable. For many no-reference This work was partially supported by a grant from the Chinese Univer- sity of Hong Kong under the Focused Investment Scheme (Project 1903003). applications, such as video signal acquisition, enhancement etc., NR metric is their only choice for on-line quality assess- ment. Not surprisingly, NR metric design is tough, facing challenges of limited input information. Therefore, to make sure acceptable prediction performance, many NR metrics are designed to cope with speciﬁc artifacts, such as blocking, blurring, ringing, jitter/jerky motion, etc., scarifying versatil- ity for prediction accuracy. For a comprehensive overview on NR metrics, please refer to [1]. In this paper, we propose a FR video quality metric. It is an extension of our previously proposed image quality met- ric [2], which separately evaluates detail losses and additive impairments for visual quality assessment. In Section 2, we brieﬂy review our IQM, and then discuss how to extend it to VQM. Section 3 elaborates the implementation details. Sec- tion 4 shows the performance of the proposed VQM in match- ing subjective ratings. Section V provides the concluding re- marks. 2. BACKGROUND 2.1. Spatial distortion measurement Limited by the paper length, please refer to [3] for an overview on image quality assessment. In our VQM, we adopt our previous work [2] to measure the spatial distor- tions. Instead of treating the spatial distortions indistinguish- ably, they are decomposed into details losses and additive impairments. As the name implies, detail losses refer to the loss of useful information which affects the content visibil- ity. Additive impairments, on the other hand, refer to the redundant visual information which does not belong to the original image but appears in the distorted image. Their appearance will distract viewer’s attention from the useful picture contents, causing unpleasant viewing experience. To assist understanding, an illustration is given in Fig. 1. In Fig. 1 (a), the distorted image is separated into the original image and the error image. Typically, HVS-model based IQMs will try to simulate low-level HVS responses to the error image, treating these distortions as being homogeneous. As shown in Fig. 1 (b), the proposed method will further separate the distortions into detail losses and additive impairments. For JPEG compressed images, as the one shown in Fig. 1, the 2011 Third International Workshop on Quality of Multimedia Experience 978-1-4577-1334-7/11/$26.00 ©2011 IEEE 90