Gaze based quality assessment of visual media understanding Jean Martinet * , Adel Lablack * , Stanislas Lew * , Chabane Djeraba * * LIFL - University of Lille, France Abstract— Visual media is one of the most widely used in our societies. With the increasing demand for digital image and video technologies in applications such as communication, advertising, or entertainment, there is a growing need for assessment tools to evaluate the quality of visual media understanding. It is necessary to quantify the adequacy of an audience visual media perception and the original message or idea that the media creator intended to transmit. The aim of our work is to build a framework for measuring the quality of a visual media, that is to say its ability to transmit the original idea of the creator, and possibly to give recommendations to the creator about how to better design the media. Based on the recorded gaze data of people viewing the media, we design some qualitative indicators helping to assess the perception of the media by a target audience. I. I NTRODUCTION With the increasing demand for digital image and video technologies in applications such as communication, advertis- ing, or entertainment, there is a growing need for assessment tools to evaluate the quality of visual media understanding. It is necessary to quantify the adequacy of the audience visual media understanding to the original message or idea that the media creator intended to transmit. Indeed, when advertising agencies and film makers produce a visual media – an image or a movie, authors carefully chose their subject, scene, and settings with the objective to transmit a precise message to the viewer. How can one be sure that the intended message is correctly received by the audience? And how to verify that the most important items displayed in their material is well perceived by the audience? Because of the physiology of the human eye and the human vision system, only a restricted area of the scene can be perceived at a time, in the fovea region. For applications and products that target human consumers, it is desirable to have metrics that will predict the perceived visual quality as measured with human subjects. Quality as- sessment of visual media understanding aims at quantifying the quality of visual media understanding by the audience, including still pictures and image sequences, by means of quality metrics. Providing such an evaluation tool is crucial for controlling the audience perception of the media in existing and emerging multimedia systems. Such tools are especially important in constrained environments, for instance when the media is an advertisement (still image or video) meant to be viewed in a passing place. It also has the potential to impact next- generation systems by providing objective metrics to be used during the design and testing stages, thereby reducing the need for extensive evaluation with human subjects. With such a tool, media producers could potentially save time by designing suitable media, following the recommendations of the system. The tool could for instance state that such item is not correctly seen in a given shot, and would be better seen if placed in such location at such moment. This paper presents a first step towards a quality assess- ment of visual media understanding based on gaze. This step consists in the recording and clustering of gaze points from person viewing the visual media, and then elaborating several estimators for analyzing this data. The promising results of ongoing experiments are presented. We briefly review in the following section some related research about gaze analysis and its applications. Then we describe in SectionIII our modeling of the problem in terms of basic quality descriptors that are combined together into a global estimator. SectionIV describes experiments that we carried out to demonstrate the usefulness of the proposed approach. II. RELATED WORK With the recent development of low-cost gaze tracker de- vices, the possibility of taking advantage of the information conveyed in gaze has opened many research directions, namely in image compression – where users’ gaze is used to set variable compression ratios at different places in an image, in marketing – for detecting products of interest for cus- tomers, civil security – for detecting drowsiness or lack of concentration of persons operating machinery such as motor vehicles or air traffic control systems, and in human-computer interactions. In the latter for instance, the user’s gaze is used as a complementary input device to traditional ones such as a mouse and a keyboard, namely for disabled users. A. Gaze analysis The analysis of gaze has been studied for over a century in several disciplines, including physiology, psychology, psycho- analysis, and cognitive sciences. The objective is to analyze eye saccades and fixations of persons watching a given scene, in order to extract several kinds of information. During the visual perception, human eyes move and successively fixate at the most informative parts of the image [1]. Attention is the cognitive process of selectively concentrating on one aspect of the environment while ignoring other things. For images and video, the visual attention is at the core of the visual perception, because it drives the gaze to salient points in the scene.