ON THE ROLE OF CONTEXT IN PROBABILISTIC MODELS OF VISUAL SALIENCY Neil D. B. Bruce, Pierre Kornprobst INRIA, 2004 Route des Lucioles, B.P. 93, 06902 Sophia Antipolis Cedex, France ABSTRACT In recent years, many principled probabilistic deﬁnitions for the de- termination of visual saliency have been proposed. Moreover, there has been increased focus on the role of context in the determination of visual salience. Prior efforts have shed some light on how context may help in predicting the location of, or presence of features associ- ated with an object in the context of detection or recognition. Never- theless, there remains a variety of manners in which context may be exploited towards providing better judgements of salient content. In this light, we investigate the role of context in the probabilistic deter- mination of salience while presenting a number of potential avenues for future research. Index Terms— saliency, context, image statistics, attention The complexity of visual search demands strategies for focusing high-level processing on some subset of the incoming stream of vi- sual input at the expense of detailed processing of other visual input [1]. This focal processing may take the form of biased processing towards certain locations or features in the scene. At the same time, outside of the demands of a particular visual task deﬁnition, it is im- portant to be alerted to content that may be of interest in its own right, for example a predator suddenly appearing while an animal is searching for food. These two elements constitute respectively, the task driven top-down side of attention which serves to instigate bias towards task relevant content, and the bottom-up side which may be viewed as a stimulus driven component which results in the deploy- ment of attention towards conspicuous visual patterns. Recently, a variety of models of saliency and attentional bias have emerged hav- ing as a basis a probabilistic deﬁnition for content of interest. There are a number of elements in the computation performed by these models that differ from one model to another and that are important as they impact on the behavior of the models. Moreover, there are a variety of issues that relate to the notion of context in probabilistic saliency computation that deserve further consideration. This is in essence the subject matter of this paper; while the subject matter put forth demonstrates the efﬁcacy or importance of context in certain aspects of saliency computation, this is also equally a road map in- dicating a variety of promising avenues for further research efforts and in addition, strategies that may be exploited depending on the nature of the task under consideration. The structure of the paper is as follows: We ﬁrst begin with an overview of recent models of visual saliency computation that have at their core, a probabilistic determination of saliency. This includes some discussion of the differences between these proposals and additionally highlights areas where contextual information has The research leading to these results has received funding from the Eu- ropean Community’s Seventh Framework Programme under grant agreement no. 215866, project SEARISE. been successfully exploited to improve the explanatory power of the models in question. Following this, we consider a few important issues pertaining to the determination of visual salience. These are respectively, the role of location in saliency computation, and the role of environmental statistics in the determination of saliency. 1. BACKGROUND In recent years, a variety of proposals for the computation of visual saliency have emerged which form judgments of saliency on the ba- sis of a probabilistic determination. In this section, we provide an overview of these proposals and highlight areas in which contextual information is currently employed for the purposes of saliency com- putation. 1.1. An Information Theoretic Approach In [2, 3], the authors propose a strategy for visual saliency compu- tation based on an information theoretic approach deemed attention based on information maximization (AIM). The authors propose a strategy for the determination of visual saliency that is analogous to Shannon’s work on the transmission of English words [4]. In short, the salience of a local neighborhood x of the scene is given by its self-information -log(p(x|C)) where C is the context on which this estimate is based. In [2] it is suggested that this context may be a local neighborhood surrounding the local observation x, but is computed with C constituting the entire scene for computational par- simony. The likelihood estimate in this case is achieved through the use of a set of ﬁlters learned through Independent Component Anal- ysis (ICA). This results in a set of feature maps that may be assumed statistically independent and follows the proposal made in [5]. This operation reduces the likelihood estimate from one in a 3N 2 dimen- sional space (with N the width of a local patch in RGB space), to 3N 2 one dimensional density estimates. This is an important con- tribution as it places the likelihood estimate of a local patch within a form that is computationally tractable. 1.2. A Discriminant Approach In [6], saliency is formulated within the context of a discriminant deﬁnition. This amounts to considering the power of some set of fea- tures to discriminate between observations drawn from a central re- gion and those drawn from a surrounding region. Speciﬁcally, given some set of features X = X1, ..., X d , a location l and a class la- bel Y with Y l =0 corresponding to samples drawn from the sur- round region and Y l =1 corresponding to samples drawn from a smaller central region centered at l. The judgement of saliency then corresponds to a measure of mutual information, computed as