VIDEO QUALITY ASSESSMENT BY DECOUPLING ADDITIVE IMPAIRMENTS AND
DETAIL LOSSES
Songnan Li, Lin Ma, King Ngi Ngan
Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong SAR
ABSTRACT
In this paper, a review on existing methods of extending im-
age quality metric to video quality metric is given. It is found
that three processing steps are usually involved which include
the temporal channel decomposition, temporal masking and
error pooling. They are utilized to extend our previously pro-
posed image quality metric, which separately evaluates ad-
ditive impairments and detail losses, to video quality metric.
The resultant algorithm is tested on subjective video database
LIVE and shows a good performance in matching subjective
ratings.
Index Terms— video quality assessment, distortion de-
coupling, human visual system, visual masking
1. INTRODUCTION
Since the human visual system (HVS) is the ultimate receiver
of the video service, subjective viewing test is considered to
be the most reliable way to evaluate visual quality. However,
subjective viewing test is expensive, and not feasible for on-
line manipulations, which makes it impractical for system de-
sign, quality monitoring, etc. Therefore, an accurate objec-
tive VQA algorithm, or namely video quality metric (VQM),
becomes of fundamental importance to future multimedia ap-
plications.
It is customary to classify VQM into three categories
according to the reference availability: full-reference (FR),
reduced-reference (RR), and no-reference (NR) metrics. In
FR metrics, the reference is fully available and is assumed to
have maximum quality. They can be applied in applications
where the reference is fully available, such as image/video
coding, watermarking etc. RR metrics extract features from
the reference video, transmit them to the receiver side to
compare against the corresponding features extracted from
the distorted video. The design of RR metric mainly targets
at quality monitoring. These features should be carefully
selected to achieve both effectiveness and efficiency, i.e.,
predicting quality with great accuracy and small overhead
for feature representation. NR metrics require no reference,
therefore are most broadly applicable. For many no-reference
This work was partially supported by a grant from the Chinese Univer-
sity of Hong Kong under the Focused Investment Scheme (Project 1903003).
applications, such as video signal acquisition, enhancement
etc., NR metric is their only choice for on-line quality assess-
ment. Not surprisingly, NR metric design is tough, facing
challenges of limited input information. Therefore, to make
sure acceptable prediction performance, many NR metrics
are designed to cope with specific artifacts, such as blocking,
blurring, ringing, jitter/jerky motion, etc., scarifying versatil-
ity for prediction accuracy. For a comprehensive overview on
NR metrics, please refer to [1].
In this paper, we propose a FR video quality metric. It is
an extension of our previously proposed image quality met-
ric [2], which separately evaluates detail losses and additive
impairments for visual quality assessment. In Section 2, we
briefly review our IQM, and then discuss how to extend it to
VQM. Section 3 elaborates the implementation details. Sec-
tion 4 shows the performance of the proposed VQM in match-
ing subjective ratings. Section V provides the concluding re-
marks.
2. BACKGROUND
2.1. Spatial distortion measurement
Limited by the paper length, please refer to [3] for an
overview on image quality assessment. In our VQM, we
adopt our previous work [2] to measure the spatial distor-
tions. Instead of treating the spatial distortions indistinguish-
ably, they are decomposed into details losses and additive
impairments. As the name implies, detail losses refer to the
loss of useful information which affects the content visibil-
ity. Additive impairments, on the other hand, refer to the
redundant visual information which does not belong to the
original image but appears in the distorted image. Their
appearance will distract viewer’s attention from the useful
picture contents, causing unpleasant viewing experience. To
assist understanding, an illustration is given in Fig. 1. In Fig.
1 (a), the distorted image is separated into the original image
and the error image. Typically, HVS-model based IQMs will
try to simulate low-level HVS responses to the error image,
treating these distortions as being homogeneous. As shown
in Fig. 1 (b), the proposed method will further separate the
distortions into detail losses and additive impairments. For
JPEG compressed images, as the one shown in Fig. 1, the
2011 Third International Workshop on Quality of Multimedia Experience
978-1-4577-1334-7/11/$26.00 ©2011 IEEE 90