VISUAL CORRELATES OF FIXATION SELECTION: A LOOK AT THE SPATIAL FREQUENCY DOMAIN Neil D. B. Bruce, Daniel P. Loach, John K. Tsotsos York University Department of Computer Science and Centre for Vision Research 4700 Keele Street, Toronto, Ontario, Canada M3J 1P3 ABSTRACT A representation for observing local image content is pro- posed for the purpose of considering the distinguishing char- acteristics of visual content that tends to draw a human ob- servers gaze. Within this representation, the spectral prole distinguishing xated from non-xated locations is consid- ered. Finally, the possibility of designing saliency operators based on the proposed local magnitude spectrum representa- tion is explored, revealing a promising domain for predicting human gaze patterns. Index Termsspatial frequency, fourier transform, mag- nitude spectrum, attention, xation 1. INTRODUCTION The primate visual system is foveated and thus samples visual content at the center of xation at a much higher resolution than in the periphery. Head movements and eye movements are made in such a manner that some regions of a scene re- ceive intense scrutiny while others are relegated to only very low resolution sampling. In recent years, several attempts have been made at furthering the understanding of this selec- tion process for its utility as a precursor to various operations of interest in image processing such as perceptually motivated compression or quality assessment. It is undeniable that xation selection is inuenced appre- ciably by at least two factors: The properties of the surround- ing environment, and the goals of the observer. For example, one might be far more likely to xate faces in a crowd while looking for a friend, but would almost certainly be distracted by a bright ash of light, or vivid colors while doing so. In the literature, these two distinct components of the selection process are frequently referred to as top-down and bottom-up components respectively. In this paper we consider the latter of these categories in order to address the following question: To the extent that se- lection of xation points is stimulus driven, what sort of stim- ulus properties draw a human observers’ gaze. Consideration We gratefully acknowledge NSERC for funding this research project. of this problem has been the focus of some recent research ef- forts [1, 2]. Generally the approach that is taken in addressing this problem, is that of considering some basic feature mea- sures on the image (e.g. contrast, edges etc.) and observing the extent to which such features are able to predict xations. One limitation of this sort of study, is that typically fea- tures are considered in isolation. That is, the extent to which edges, contrast, colors and other features are predictive of x- ation points is typically considered for each feature indepen- dently. In reality, it is likely that some combination of these various features determines the criterion for xation selec- tion. It is this observation that forms the basis for the work presented here. It is expected that in considering local image content in a manner that simultaneously captures a rich array of orientation and spatial frequency content present in a local neighborhood of the scene, that this may elucidate the nature of stimuli that attract an observers gaze and as a by-product, afford a system for predicting xation points. Some previous efforts that characterize saliency based on some combination of features have shown success [3, 4]. The distinction made in this work, is that i. Analysis is based on a raw representa- tion of spatial frequency and orientation content rather than a combination of features eliminating dependence on a specic feature set ii. Because of consideration i. the features that give rise to xation selection, or distinguish those points that are xated from those that are not are more directly observ- able. iii. When viewed as a saliency operator, the operator proposed here is qualitatively different than any previous ef- fort offering the possibility of improved performance, or at a minimum, a deeper understanding of what sort of model and/or stimuli is important in characterizing human gaze. 2. LOCAL MAGNITUDE SPECTRA As described in the introduction, we seek a representation that allows direct observation of orientation and spatial frequency content within the image in question. The most obvious rep- resentation tting this criterion, with a long history of use in signal processing, is the magnitude spectrum. It has been demonstrated that such spectra are able to adequately char- III - 289 1-4244-1437-7/07/$20.00 ©2007 IEEE ICIP 2007