VISUAL CORRELATES OF FIXATION SELECTION:
A LOOK AT THE SPATIAL FREQUENCY DOMAIN
Neil D. B. Bruce, Daniel P. Loach, John K. Tsotsos
York University
Department of Computer Science and Centre for Vision Research
4700 Keele Street, Toronto, Ontario, Canada M3J 1P3
ABSTRACT
A representation for observing local image content is pro-
posed for the purpose of considering the distinguishing char-
acteristics of visual content that tends to draw a human ob-
servers gaze. Within this representation, the spectral profile
distinguishing fixated from non-fixated locations is consid-
ered. Finally, the possibility of designing saliency operators
based on the proposed local magnitude spectrum representa-
tion is explored, revealing a promising domain for predicting
human gaze patterns.
Index Terms— spatial frequency, fourier transform, mag-
nitude spectrum, attention, fixation
1. INTRODUCTION
The primate visual system is foveated and thus samples visual
content at the center of fixation at a much higher resolution
than in the periphery. Head movements and eye movements
are made in such a manner that some regions of a scene re-
ceive intense scrutiny while others are relegated to only very
low resolution sampling. In recent years, several attempts
have been made at furthering the understanding of this selec-
tion process for its utility as a precursor to various operations
of interest in image processing such as perceptually motivated
compression or quality assessment.
It is undeniable that fixation selection is influenced appre-
ciably by at least two factors: The properties of the surround-
ing environment, and the goals of the observer. For example,
one might be far more likely to fixate faces in a crowd while
looking for a friend, but would almost certainly be distracted
by a bright flash of light, or vivid colors while doing so. In
the literature, these two distinct components of the selection
process are frequently referred to as top-down and bottom-up
components respectively.
In this paper we consider the latter of these categories in
order to address the following question: To the extent that se-
lection of fixation points is stimulus driven, what sort of stim-
ulus properties draw a human observers’ gaze. Consideration
We gratefully acknowledge NSERC for funding this research project.
of this problem has been the focus of some recent research ef-
forts [1, 2]. Generally the approach that is taken in addressing
this problem, is that of considering some basic feature mea-
sures on the image (e.g. contrast, edges etc.) and observing
the extent to which such features are able to predict fixations.
One limitation of this sort of study, is that typically fea-
tures are considered in isolation. That is, the extent to which
edges, contrast, colors and other features are predictive of fix-
ation points is typically considered for each feature indepen-
dently. In reality, it is likely that some combination of these
various features determines the criterion for fixation selec-
tion. It is this observation that forms the basis for the work
presented here. It is expected that in considering local image
content in a manner that simultaneously captures a rich array
of orientation and spatial frequency content present in a local
neighborhood of the scene, that this may elucidate the nature
of stimuli that attract an observers gaze and as a by-product,
afford a system for predicting fixation points. Some previous
efforts that characterize saliency based on some combination
of features have shown success [3, 4]. The distinction made
in this work, is that i. Analysis is based on a raw representa-
tion of spatial frequency and orientation content rather than a
combination of features eliminating dependence on a specific
feature set ii. Because of consideration i. the features that
give rise to fixation selection, or distinguish those points that
are fixated from those that are not are more directly observ-
able. iii. When viewed as a saliency operator, the operator
proposed here is qualitatively different than any previous ef-
fort offering the possibility of improved performance, or at
a minimum, a deeper understanding of what sort of model
and/or stimuli is important in characterizing human gaze.
2. LOCAL MAGNITUDE SPECTRA
As described in the introduction, we seek a representation that
allows direct observation of orientation and spatial frequency
content within the image in question. The most obvious rep-
resentation fitting this criterion, with a long history of use in
signal processing, is the magnitude spectrum. It has been
demonstrated that such spectra are able to adequately char-
III - 289 1-4244-1437-7/07/$20.00 ©2007 IEEE ICIP 2007