Neural Networks 21 (2008) 1420–1430 Contents lists available at ScienceDirect Neural Networks journal homepage: www.elsevier.com/locate/neunet 2008 Special Issue Stereo saliency map considering affective factors and selective motion analysis in a dynamic environment Sungmoon Jeong a , Sang-Woo Ban b , Minho Lee a,* a School of Electrical Engineering and Computer Science, Kyungpook National University, 1370 Sankyuk-Dong, Puk-Gu, Taegu 702-701, Republic of Korea b Department of Information and Communication Engineering, Dongguk University, 707 Seokjang-Dong, Geyongju, Gyeongbuk 780-714, Republic of Korea article info Article history: Received 30 April 2008 Received in revised form 9 October 2008 Accepted 14 October 2008 Keywords: Integrated saliency map Stereo saliency map Affective attention Bottom-up attention Selective motion analysis abstract We propose new integrated saliency map and selective motion analysis models partly inspired by a biological visual attention mechanism. The proposed models consider not only binocular stereopsis to identify a final attention area so that the system focuses on the closer area as in human binocular vision, based on the single eye alignment hypothesis, but also both the static and dynamic features of an input scene. Moreover, the proposed saliency map model includes an affective computing process that skips an unwanted area and pays attention to a desired area, which reflects the human preference and refusal in subsequent visual search processes. In addition, we show the effectiveness of considering the symmetry feature determined by a neural network and an independent component analysis (ICA) filter which are helpful to construct an object preferable attention model. Also, we propose a selective motion analysis model by integrating the proposed saliency map with a neural network for motion analysis. The neural network for motion analysis responds selectively to rotation, expansion, contraction and planar motion of the optical flow in a selected area. Experiments show that the proposed model can generate plausible scan paths and selective motion analysis results for natural input scenes. © 2008 Elsevier Ltd. All rights reserved. 1. Introduction The human visual system can effortlessly detect an interesting area or object within natural or cluttered scenes through the selective attention mechanism. This mechanism allows the human vision system to process more effectively visual scenes with a higher level of complexity. The human visual system sequentially interprets not only a static or dynamic input scene but also a stereo scene with affective factors based on the selective attention mechanism. Itti, Koch, and Niebur (1998) introduced a brain-like model in order to generate the saliency map (SM). Koike and Saiki (2002) proposed that a stochastic winner take all (WTA) enables the saliency-based search model to change search efficiency by varying the relative saliency, due to stochastic shifts of attention. Kadir and Brady (2001) proposed an attention model integrating saliency, scale selection and a content description, thus contrasting with many other approaches. Ramström and Christensen (2002) calculated saliency with respect to a given task by using a multi- scale pyramid and multiple cues. Their saliency computations were based on game theory concepts. Carmi and Itti (2006) proposed * Corresponding author. Tel.: +82 53 950 6436; fax: +82 53 950 5505. E-mail address: mholee@knu.ac.kr (M. Lee). an attention model that considers seven dynamic features in MTV- style video clips, and also proposed an integrated attention scheme to detect an object by combining bottom-up SM with top-down attention based on the signal-to-noise ratio (Navalpakkam & Itti, 2006). As well, Walther, Rutishauser, Koch, and Perona (2005) proposed an object preferred attention scheme that considers the bottom-up SM results as biased weights for top-down object-perception. Fernández-Caballero, López, and Saiz-Valverde (2008) developed a dynamic stereoscopic selective visual attention model that integrates motion and depth in order to choose the attention area. Maki, Nordlund, and Eklundh (2000) proposed an attention model integrating image flow, stereo disparity and motion for attentional scene segmentation. Ouerhani and Hügli (2000) proposed a visual attention model that considers depth as well as static features. Belardinelli and Pirri (2006) developed a biologically plausible robot attention model, which also considers depth for attention. Frintrop, Rome, Nüchter, and Surmann (2005) proposed a bimodal laser-based attention system that considers both static features including color and depth for generating proper attention. These different attention models agree that depth and motion also play important roles in visual attention, and they properly considered depth and motion as well as static features. However, key mechanisms of integration of different features are still unclear. Choi, Jung, Ban, Niitsuma, and Lee (2006) and Park, An, and Lee (2002) have also proposed a bottom-up SM model by using symmetry with an ICA filter and implemented a 0893-6080/$ – see front matter © 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.neunet.2008.10.002