Neural Networks 21 (2008) 1420–1430
Contents lists available at ScienceDirect
Neural Networks
journal homepage: www.elsevier.com/locate/neunet
2008 Special Issue
Stereo saliency map considering affective factors and selective motion analysis in
a dynamic environment
Sungmoon Jeong
a
, Sang-Woo Ban
b
, Minho Lee
a,*
a
School of Electrical Engineering and Computer Science, Kyungpook National University, 1370 Sankyuk-Dong, Puk-Gu, Taegu 702-701, Republic of Korea
b
Department of Information and Communication Engineering, Dongguk University, 707 Seokjang-Dong, Geyongju, Gyeongbuk 780-714, Republic of Korea
article info
Article history:
Received 30 April 2008
Received in revised form
9 October 2008
Accepted 14 October 2008
Keywords:
Integrated saliency map
Stereo saliency map
Affective attention
Bottom-up attention
Selective motion analysis
abstract
We propose new integrated saliency map and selective motion analysis models partly inspired by a
biological visual attention mechanism. The proposed models consider not only binocular stereopsis to
identify a final attention area so that the system focuses on the closer area as in human binocular vision,
based on the single eye alignment hypothesis, but also both the static and dynamic features of an input
scene. Moreover, the proposed saliency map model includes an affective computing process that skips an
unwanted area and pays attention to a desired area, which reflects the human preference and refusal in
subsequent visual search processes. In addition, we show the effectiveness of considering the symmetry
feature determined by a neural network and an independent component analysis (ICA) filter which are
helpful to construct an object preferable attention model. Also, we propose a selective motion analysis
model by integrating the proposed saliency map with a neural network for motion analysis. The neural
network for motion analysis responds selectively to rotation, expansion, contraction and planar motion
of the optical flow in a selected area. Experiments show that the proposed model can generate plausible
scan paths and selective motion analysis results for natural input scenes.
© 2008 Elsevier Ltd. All rights reserved.
1. Introduction
The human visual system can effortlessly detect an interesting
area or object within natural or cluttered scenes through the
selective attention mechanism. This mechanism allows the human
vision system to process more effectively visual scenes with a
higher level of complexity. The human visual system sequentially
interprets not only a static or dynamic input scene but also a
stereo scene with affective factors based on the selective attention
mechanism.
Itti, Koch, and Niebur (1998) introduced a brain-like model
in order to generate the saliency map (SM). Koike and Saiki
(2002) proposed that a stochastic winner take all (WTA) enables
the saliency-based search model to change search efficiency by
varying the relative saliency, due to stochastic shifts of attention.
Kadir and Brady (2001) proposed an attention model integrating
saliency, scale selection and a content description, thus contrasting
with many other approaches. Ramström and Christensen (2002)
calculated saliency with respect to a given task by using a multi-
scale pyramid and multiple cues. Their saliency computations were
based on game theory concepts. Carmi and Itti (2006) proposed
*
Corresponding author. Tel.: +82 53 950 6436; fax: +82 53 950 5505.
E-mail address: mholee@knu.ac.kr (M. Lee).
an attention model that considers seven dynamic features in MTV-
style video clips, and also proposed an integrated attention scheme
to detect an object by combining bottom-up SM with top-down
attention based on the signal-to-noise ratio (Navalpakkam & Itti,
2006). As well, Walther, Rutishauser, Koch, and Perona (2005)
proposed an object preferred attention scheme that considers
the bottom-up SM results as biased weights for top-down
object-perception. Fernández-Caballero, López, and Saiz-Valverde
(2008) developed a dynamic stereoscopic selective visual attention
model that integrates motion and depth in order to choose the
attention area. Maki, Nordlund, and Eklundh (2000) proposed
an attention model integrating image flow, stereo disparity and
motion for attentional scene segmentation. Ouerhani and Hügli
(2000) proposed a visual attention model that considers depth as
well as static features. Belardinelli and Pirri (2006) developed a
biologically plausible robot attention model, which also considers
depth for attention. Frintrop, Rome, Nüchter, and Surmann (2005)
proposed a bimodal laser-based attention system that considers
both static features including color and depth for generating proper
attention. These different attention models agree that depth and
motion also play important roles in visual attention, and they
properly considered depth and motion as well as static features.
However, key mechanisms of integration of different features
are still unclear. Choi, Jung, Ban, Niitsuma, and Lee (2006) and
Park, An, and Lee (2002) have also proposed a bottom-up SM
model by using symmetry with an ICA filter and implemented a
0893-6080/$ – see front matter © 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.neunet.2008.10.002