XXX SIMP ´ OSIO BRASILEIRO DE TELECOMUNICAC¸ ˜ OES - SBrT’12, 13-16 DE SETEMBRO DE 2012, BRAS ´ ILIA, DF Video Quality Assessment Based on the Effect of the Estimation of the Spatial Perceptual Information Carlos D. M. Regis, Jos´ e V. de M. Cardoso and Marcelo S. Alencar Abstract— The objective video quality assessment is a quick and low cost alternative compared with the subjective evaluation. However, the objective evaluation is not as reliable, because their results are not always according to the perceived quality of the human visual system. This paper presents a new metric for objective video quality assessment, called PW-SSIM, based on the investigation of how the spatial perceptual information can be used as an estimate to predict the visual attention to a particular region of the video and insert a quality weighting according to the spatial perceptual information values. The PW- SSIM presents higher correlation coefficients when compared to popular models (PSNR and SSIM), for a subjective evaluation with 40 participants, considering degraded videos with salt and pepper, blurring and blocking, and 24 participants considering videos subject to Gaussian noise, suggesting that the PW-SSIM has a better ability to predict the perceived video quality for a group of spectators. Keywords— Video Quality Assessment, Structural Similarity, Visual Attention, Spatial Perceptual Information, Human Visual System. I. I NTRODUCTION Video quality assessment methods are subdivided into two categories: objective and subjective. Objective methods are computational models which, using statistical characteristics of the video, estimates the quality score, classified according to the availability of the original signal: full reference, in which the original video is compared with the test video; reduced reference, whenever only the characteristics of the original video are available for comparison with the test video and no reference, in which only the test video is used for assessment of the video quality. However, subjective methodologies assess the video quality via psychophysical experiments with human observers. The observer watches the video sequences and evaluates the video acording to a personal concept of quality. The subjective approach is the natural way to assess of the video quality [1]. Nevertheless, subjective experiments are complex and time-consuming. Objective metrics are faster and has lower cost than the subjective metrics, because their results may be applied automatically to video systems, to detect imperceptible degradations to the human eye. Objective video quality assessment constitutes an important sector for video services and processing systems, such as: vigilance systems [2], video on demand [3], spatial transcoding systems [4] and video conferecing [5]. However, the classical objective metrics, such as MSE (Mean Squared Error) and PSNR (Peak Signal to Noise Ratio), present an unsatisfactory correlation with the results provided by subjective evaluation, compromising the reliability of their measures [6]. Currently, the objective metrics that show better correlation with subjective tests are based on the structural similarity approach, proposed by Wang et al [7]. In an attempt to improve this approach, many researchers investigate how to introduce characteristcs of the human visual system (HVS) in the objective metrics, in order to raise the correlation with the subjective results. One of the key areas of research that are being investigated to obtain this improvement is the visual attention of the HVS. Experiments indicate that the human visual attention is not equally distributed throughout the image space, but concen- trates in a few regions [8]. It is estimated that the inclusion of methods that can identify the visual attention of a scene, i.e., assign a weight to the visual importance of regions on the image, tends to enhance the measures provided by the objective metrics. Akamine and Farias [9] investigated the computational modeling of the visual attention peformed by saliency maps that were incorporated in objective metrics (PSNR and SSIM). This technique presents good results, mainly for saliency maps generated from eye-tracking, called subjective salience maps. You et al [10] also investigated the visual attention modelated by the saliency map, saliency attention map and GAFFE map [11], as an important factor to assess the objective image quality. Oprea et al [12] included elements that attract the attention: color contrast, size, orientation and eccentricity on the image quality assessment. The authors propose a new objective metric, for full refe- rence video quality assessment, derived from the structural si- milarity index (SSIM), which includes a visual attention model based on the weighting of the spatial perceptual information (SI) of each region. It is called Structural Similarity Index with Perceptual Weighting (PW-SSIM). The proposed metric was compared with MSE, PSNR and SSIM by means of the Pearson Correlation Coefficient (CC) and Spearman Rank- order Correlation Coefficient (SROCC). This paper is organized as follow. Section II describes the Structural Similarity Index approach. Section III describes the proposed approach to objective video quality assessment. Section IV presents the experiments of subjective evaluation. Section V shows the simulation results and section VI presents the conclusion. II. SSIM: STRUCTURAL SIMMILARITY I NDEX The Structural SIMilarity Index (SSIM) is a model proposed by Wang et al [13], based on the structural information of the image. Let f = {f i | i =1, 2, 3,...,P } be the original video