What Are We Looking For: Towards Statistical Modeling of Saccadic Eye Movements and Visual Saliency Xiaoshuai Sun, Hongxun Yao, Rongrong Ji Dept. of Computer Science, Harbin Institute of Technology, Heilongjiang, China {xiaoshuaisun, H.yao, rrji}@hit.edu.cn Abstract In this paper, we present a uniﬁed statistical frame- work for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye ﬁxations on natural images, we found that human at- tention is sparsely distributed and usually deployed to lo- cations with abundant structural information. This new ob- servations inspired us to model saccadic behavior and vi- sual saliency based on Super Gaussian Component (SGC) analysis. The model sequentially obtains SGC using projec- tion pursuit, and generates eye-movements by selecting the location with maximum SGC response. Beside human sac- cadic behavior simulation, we also demonstrated our supe- rior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on psychological pattern- s and human eye ﬁxation benchmarks. These results also show promising potentials of statistical approaches for hu- man behavior research. 1. Introduction Attention guided saccadic eye-moment is one of the most important mechanisms in biological vision systems, based on which the viewer is able to actively explore the environ- ment with high resolution fovea sensors. Beneﬁtting from such unique behavior, human beings, as well as most pri- mates, are able to efﬁciently process the information from complex environments. For the last four decades, extensive research works have been done by means of theoretical rea- soning and computational modeling, trying to uncover the principles that underlie the deployment of gaze. Compared with theoretic hypotheses, computational models of visual attention and saccadic eye-movement not only help us bet- ter understand the mechanism of human cognitive behavior but also provide us powerful tools to solve various vision related problems such as video compression [1], scene un- derstanding [2], object detection and recognition [3] etc. In this paper, our goal is to establish a statistical frame- Figure 1. What are we looking for when viewing a scene? Our s- tudies suggest that the answer to this question could be revealed vi- a statistical analysis of human eye ﬁxations. One possible answer named Super Gaussian Component is investigated in this paper. work for both saccadic behavior simulation and visual saliency analysis. Different with previous works that drew inspirations from the existing neurobiological knowledge or mathematical theories, we directly make assumptions based on the statistical analysis of the ground truth human eye- ﬁxations. By means of statistical analysis, we try to ﬁnd out “what components in visual images draw ﬁxations” which is similar but more reachable compared with the traditional question of “what properties draw attention”. The analy- sis is conducted on eye ﬁxation data captured from human observers using an eye tracking device during task indepen- dent free viewing of natural images. In such bottom-up s- cenario, we have found an interesting phenomenon, which might further be proved as a general principle, that stimuli with a super Gaussian distribution is more likely to gather human gaze. Based on this ﬁnding, human saccadic be- havior can be modeled as a function of active information pursuit targeting at the statistical components with desired properties such as super Gaussianity. In our framework (Figure 2), visual data is represented as an ensemble of small image patches. Kurtosis maximiza- tion is adopted to search for the Super Gaussian Component 978-1-4673-1228-8/12/$31.00 ©2012 IEEE 1552