Modulating Vision with Motor Plans: A
Biologically-inspired Efficient Allocation of
Visual Resources
Luka Lukic
1,2
, Aude Billard
2
and José Santos-Victor
1
Abstract—This paper presents a novel, biologically-inspired,
approach for an efficient management of computational re-
sources for visual processing. In particular, we modulate a visual
“attentional landscape” with the motor plans of a robot. The
attentional landscape is a more recent, general and a more
complex concept of an arrangement of spatial attention than
a simple ‘‘attentional spotlight” or a ‘‘zoom-lens” model of
attention. A higher attention priority for visual processing must
be given to manipulation-relevant parts of the visual field,
in contrast with other, manipulation-irrelevant, parts. Hence,
in our model visual attention is not exclusively defined in
terms of visual saliency in color, texture or intensity cues,
it is rather modulated by motor (manipulation) programs.
This computational model is supported by recent experimental
findings in visual neuroscience and physiology. We show how
this approach can be used to efficiently distribute limited
computational resources devoted to visual processing, which is
very often the computational bottleneck in a robot system. The
model offers a view on the well-known concept of visual saliency
that has not been tackled so far, thus this approach can offer
interesting alternative prospects not only for robotics, but also
for computer vision, physiology and neuroscience. The proposed
model is validated in a series of experiments conducted with the
iCub robot, both using the simulator and with the real robot.
I. INTRODUCTION
Vision is one of the most computationally demanding mod-
ules in a robot system, representing very often a bottleneck
for manipulation applications. Most of the approaches in
robot vision are based on standard image processing tech-
niques, ignoring most, if not all, the task-relevant dynamic
information. This implies that the visual system and the
arm-hand system are usually considered as two independent
modules that communicate only in the direction from vision
to manipulation, which implies that during visual processing
the valuable information from the manipulation system is
completely ignored. In this work we show that coupling
visual processing with manipulation plans can drastically
improve visual performances, in particular, the speed of
visual computation.
If we put this in a real-world context, let us imagine
a robot bartender, equipped with an active stereo camera
system that has the task to grasp a glass, fill it with a
beverage of choice, and serve it to a guest. In a visually-aided
manipulation, based on standard vision processing approach,
1
VISLAB/ISR, Instituto Superior Técnico, Lisbon, Portugal:
luka.lukic@epfl, jasv@isr.ist.utl.pt.
2
LASA, EPFL, Lausanne, Switzerland: aude.billard@epfl.
Figure 1. Experimental setup with a natural task. The subject is instructed
to pour the tea into two cups and one bowl that are placed close to the
horizontal midline of the table. 4 pictures of various objects are placed
close to the border of the table and 2 pictures are placed on the wall
facing the subject. These pictures play the role of visually salient distractors
because they share the same visual features with the objects, but remain
completely irrelevant for manipulation through the entire task. The overt
attention, i.e. gaze movements, together with the scene as viewed from the
subject’s standpoint are recorded by using the WearCam system [1]. The
order of the figures from top to bottom corresponds to the progress of the
task. The cross superposed on the video corresponds to an estimated gaze
position. It can be seen that the gaze is tightly bound to an object that is
relevant to spatio-temporal requirements of the task. In spite of the presence
of salient distractors, the gaze remains tightly locked on the current object
of interest. This behavior cannot be predicted by the feature-based saliency
maps, even with the top-down extensions because in manipulation tasks,
perceptual processing is biased towards manipulation-relevant regions of the
visual field, not towards the most textured or distinctively colored stimulus.
2013 13th IEEE-RAS International Conference on
Humanoid Robots (Humanoids).
October 15 - 17, 2013. Atlanta, GA
978-1-4799-2619-0/13/$31.00 ©2013 IEEE 161