Spatiotemporal Saliency: Towards a Hierarchical Representation of Visual Saliency Neil D.B. Bruce and John K. Tsotsos Department of Computer Science and Engineering and Centre for Vision Research York University, Toronto, ON, Canada {neil,tsotsos}@cse.yorku.ca http://www.cse.yorku.ca/ ~ neil Abstract. In prior work, we put forth a model of visual saliency moti- vated by information theoretic considerations [1]. In this eﬀort we consider how this proposal extends to explain saliency in the spatiotemporal do- main and further, propose a distributed representation for visual saliency comprised of localized hierarchical saliency computation. Evidence for the eﬃcacy of the proposal in capturing aspects of human behavior is achieved via comparison with eye tracking data and a discussion of the role of neu- ral coding in the determination of saliency suggests avenues for future research. Keywords: Attention, Saliency, Spatiotemporal, Information Theory, Fixation, Hierarchical. 1 Introduction Certain visual search experiments demonstrate in dramatic fashion the imme- diate and automatic deployment of attention to unique stimulus elements in a display. This phenomenon no doubt factors appreciably into visual sampling in general inﬂuencing ﬁxational eye movements and our visual experience as a whole. Some success has been had in emulating these mechanisms [2], repro- ducing certain behavioral observations related to visual search, but the precise nature of the principles underlying such behaviors remains unknown. One recent proposal deemed Attention by Information Maximization (AIM) is grounded in a principled deﬁnition for what constitutes visually salient content derived from information theory, and has had some success in explaining certain aspects of behavior including the deployment of eye movements [1] and other visual search behaviors [3]. In this paper we further explore support for this proposal through consideration of spatiotemporal visual stimuli. This includes a comparison of the proposal against the state of the art in this domain. The following discussion reveals the eﬃcacy of the proposal put forth in AIM to explain eye movements for spatiotemporal data and also describes how the model L. Paletta and J.K. Tsotsos (Eds.): WAPCV 2008, LNAI 5395, pp. 98–111, 2009. c  Springer-Verlag Berlin Heidelberg 2009