Modulating Vision with Motor Plans: A Biologically-inspired Efﬁcient Allocation of Visual Resources Luka Lukic 1,2 , Aude Billard 2 and José Santos-Victor 1 Abstract—This paper presents a novel, biologically-inspired, approach for an efﬁcient management of computational re- sources for visual processing. In particular, we modulate a visual “attentional landscape” with the motor plans of a robot. The attentional landscape is a more recent, general and a more complex concept of an arrangement of spatial attention than a simple ‘‘attentional spotlight” or a ‘‘zoom-lens” model of attention. A higher attention priority for visual processing must be given to manipulation-relevant parts of the visual ﬁeld, in contrast with other, manipulation-irrelevant, parts. Hence, in our model visual attention is not exclusively deﬁned in terms of visual saliency in color, texture or intensity cues, it is rather modulated by motor (manipulation) programs. This computational model is supported by recent experimental ﬁndings in visual neuroscience and physiology. We show how this approach can be used to efﬁciently distribute limited computational resources devoted to visual processing, which is very often the computational bottleneck in a robot system. The model offers a view on the well-known concept of visual saliency that has not been tackled so far, thus this approach can offer interesting alternative prospects not only for robotics, but also for computer vision, physiology and neuroscience. The proposed model is validated in a series of experiments conducted with the iCub robot, both using the simulator and with the real robot. I. INTRODUCTION Vision is one of the most computationally demanding mod- ules in a robot system, representing very often a bottleneck for manipulation applications. Most of the approaches in robot vision are based on standard image processing tech- niques, ignoring most, if not all, the task-relevant dynamic information. This implies that the visual system and the arm-hand system are usually considered as two independent modules that communicate only in the direction from vision to manipulation, which implies that during visual processing the valuable information from the manipulation system is completely ignored. In this work we show that coupling visual processing with manipulation plans can drastically improve visual performances, in particular, the speed of visual computation. If we put this in a real-world context, let us imagine a robot bartender, equipped with an active stereo camera system that has the task to grasp a glass, ﬁll it with a beverage of choice, and serve it to a guest. In a visually-aided manipulation, based on standard vision processing approach, 1 VISLAB/ISR, Instituto Superior Técnico, Lisbon, Portugal: luka.lukic@epfl, jasv@isr.ist.utl.pt. 2 LASA, EPFL, Lausanne, Switzerland: aude.billard@epfl. Figure 1. Experimental setup with a natural task. The subject is instructed to pour the tea into two cups and one bowl that are placed close to the horizontal midline of the table. 4 pictures of various objects are placed close to the border of the table and 2 pictures are placed on the wall facing the subject. These pictures play the role of visually salient distractors because they share the same visual features with the objects, but remain completely irrelevant for manipulation through the entire task. The overt attention, i.e. gaze movements, together with the scene as viewed from the subject’s standpoint are recorded by using the WearCam system [1]. The order of the ﬁgures from top to bottom corresponds to the progress of the task. The cross superposed on the video corresponds to an estimated gaze position. It can be seen that the gaze is tightly bound to an object that is relevant to spatio-temporal requirements of the task. In spite of the presence of salient distractors, the gaze remains tightly locked on the current object of interest. This behavior cannot be predicted by the feature-based saliency maps, even with the top-down extensions because in manipulation tasks, perceptual processing is biased towards manipulation-relevant regions of the visual ﬁeld, not towards the most textured or distinctively colored stimulus. 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids). October 15 - 17, 2013. Atlanta, GA 978-1-4799-2619-0/13/$31.00 ©2013 IEEE 161