arXiv:physics/0603025v2 [physics.med-ph] 4 Jul 2007 Visual Saliency and Attention as Random Walks on Complex Networks Luciano da Fontoura Costa Instituto de F´ ısica de S˜ao Carlos. Universidade de S˜ao Paulo, S˜ao Carlos, SP, PO Box 369, 13560-970, phone +55 16 3373 9858, FAX +55 16 3371 3616, Brazil, luciano@if.sc.usp.br (Dated: 2nd Feb 2007) The current article shows how concepts from the areas of random walks, Markov chains, complex networks and image analysis can be naturally combined in order to provide a uniﬁed and biologically plausible model relating saliency and visual attention. Two types of models are proposed: (i) images are converted into complex networks by considering pixels as nodes while connections are established in terms of ﬁelds of inﬂuence deﬁned by visual features such as tangent ﬁelds induced by gray-level contrasts and distance; and (ii) image pixels exhibiting particularly distinctive values of visual properties such as gray-level intensity, contrast, size of objects, orientation and texture are mapped into nodes and the weights of links are deﬁned in order to favor transitions between regions with similar or diﬀerent visual features, also taking the distance between the nodes into account. Preferential random walks are performed on such networks in order to emulate attentional shifts and eye movements, and the saliency of each region is obtained in terms of the frequency of visits to each node at equilibrium. In the case of the ﬁrst model, there is a deﬁnite tendency to emphasize not only high curvature points but also convergences of the tangent ﬁeld. The frequency of visits is found to be strongly correlated with the node degrees (strengths) for this model. Diﬀerent results have been obtained for the second model as a consequence of the directed and asymmetric nature of the respectively obtained networks. PACS numbers: 87.19.Dd, 87.57.Ce, 89/75.Hc The ability to focus attention on important things is a deﬁning characteristic of intelligence. (R. J. Shiller) I. INTRODUCTION Vision [1] is the ability, given a speciﬁc scene, to recog- nize the existing objects and their respective properties (e.g. position, rotation, size, etc). Although vision is natural to animals, achieving maximum ﬂexibility in pri- mates, all attempts by science and technology to emulate this ability have to a large extent failed — full ﬂedged vi- sion is simply too complex. Artifacts (e.g. shadows and occlusion), 2D projections, and noise always present in images imply a degenerated mapping from real scenes to the biological visual representation, so that the eﬀective recognition of objects ultimately demands high levels of intelligence and comprehensive models of the visual fea- tures of our world. Actually, even the natural solutions to vision have been achieved at great cost and diﬃculty. Though nearly 50% of the human cortex is dedicated at varying degrees to visual analysis and integration, only a very small region of the visual space, the area falling onto the fovea, can be carefully analyzed at higher resolution by such a formidable parallel computing system at any time. Even so, the remaining several limitations of vision are attested by a myriad of optical illusions. The serious limitations of the cortical hardware in pro- cessing vision ultimately implied the retina to perform eﬀective pre-processing in order to ﬁlter out redundan- cies (luminance correlations) before forwarding the vi- sual information to the brain, via the lateral geniculate nucleus [2]. This is to a great extent achieved through de- tection of the borders of the objects in images, which tend to be associated to luminance contrasts. Because only the fovea, an area of the retina accountable for about just one degree of the visual ﬁeld, is engaged in high resolution image analysis, it is important to have eﬀec- tive means for moving this small window along time and space, performed through saccadic eye movements [2], so as to integrate along time and space the most important portions (saliencies) of the image into a sensible whole. Extensive experimental investigations have shown that points exhibiting high contrast (e.g. [3, 4]) and/or curva- ture (e.g. [5]) tend to play a decisive role in saliency def- inition and detection. Other important experimental ev- idences include the presence in the primary visual cortex of many neurons which are orientation sensitive, exhibit- ing the so-called simple and complex receptive ﬁelds, in the sense of being capable of estimating the tangent ﬁeld along the retinotopic representation of the scene [2]. Be- cause of the decreasing resolution along the retina as one moves from its center to the periphery, it is reasonable to assume the saliency of local portions of the image to be inversely related to the distance from those portions to the center of the fovea (or attention). In addition to gaze shift driven by saliencies, more subtle visual mechanisms are performed on the peripheral visual ﬁeld in order to decide where to look next. The shifts of attention and their relation to saliencie saliencies involving or not eye movements are the main subject of this article. In spite of the intense and extensive experimental and theoretical research in visual perception, relatively few physics-based approaches have been proposed relating saliency detection and selective attention. In addition to