COMPARATIVE EVALUATION OF VISUAL SALIENCY MODELS FOR QUALITY ASSESSMENT TASK Milind S. Gide and Lina J.Karam School of Electrical, Computer and Energy Engineering Arizona State University Tempe, AZ 85287-5706 mgide@asu.edu, karam@asu.edu ABSTRACT One important application of computational saliency models is to aid objective image quality assessment. Given this, it is necessary to evaluate existing state-of-the-art visual atten- tion models for their performance by comparing them with eye-tracking data associated with a quality assessment task. Existing comparative studies compare the saliency models with psychophysical data taken for a free viewing task and non-distorted images and hence do not accurately measure the usability of the saliency model to improve quality assess- ment pooling. This study evaluates 5 different visual atten- tion models based on a database that includes human eye- tracking data taken under a quality assessment task. The proposed study is helpful in understanding what features are useful in improving the performance of computational visual saliency models speciﬁc to quality assessment as well as in providing a generic performance assessment frame- work for comparing different competing models of visual saliency. 1. INTRODUCTION Computational visual saliency has been used to improve the performance of objective quality assessment metrics [1], [2]. Thus, there is a need to evaluate visual saliency models un- der the context of quality assessment and for different types of distortions. Previous attempts [3], [4] to evaluate compu- tational models of saliency have used eye-tracking data to evaluate existing computational saliency models. However, in these comparative studies, quality has not been consid- ered. Recent work [5] has focused on improving a compu- tational visual saliency model for quality by training a clas- siﬁer on eye-tracking data taken for images that were com- pressed with a single distortion, JPEG compression. Thus, there is a need for a comprehensive evaluation of compu- tational visual saliency models with psychophysical data speciﬁcally taken during a quality assessment task. In this paper, the performance of 5 popular visual atten- tion models is evaluated under a quality assessment context. For this purpose, an eye-tracking database [6] that consists of ﬁxation points recorded during a quality assessment task for different distortion types is used. The performance of the visual saliency models is evaluated over different distor- tion types by using several performance measures including Receiver Operating Characteristics (ROC) curves, the Area Under Curve (AUC), and other numerical measures such as correlation coefﬁcient, and Kullback Leibler Divergence (KLD). The paper is organized as follows. Section 2 describes the framework used to evaluate the visual attention models. Section 3 describes the different performance metrics that are used to evaluate the models. Section 4 presents the re- sults of the evaluation and Section 5 concludes the paper. 2. EVALUATION FRAMEWORK 2.1. Eye Tracking Database The TU Delft Interactions Database [6] includes human eye movements recorded for 14 subjects while looking at 54 dis- torted stimuli. The different distortion types used are Gaus- sian blur, white noise and JPEG compression with each type having three different levels of distortion (high, medium and low). The database provides MOS scores from the quality assessment task in addition to the saliency maps obtained from the recorded ﬁxation points. Figure 1 shows the source images taken from the LIVE database [7] that are used in the database. Figure 2 shows a distorted image and the corre- sponding average saliency map provided by the database. 2.2. Computational Visual Saliency Models In this work, ﬁve popular visual saliency models are evalu- ated including the Gaze Attentive Fixation Finding Engine (GAFFE) [8], Itti’s bottom-up Saliency model (ITTI) [9], Attention by Information Maximization (AIM) [10], Fre- quency Tuned Saliency (FTS) [11] and a Bayesian frame- work for saliency using natural statistics (SUN) [12]. While AIM [10], FTS [11] and SUN [12] generate a gray-scale