ORIGINAL ARTICLE Learning to navigate in a virtual world using optic flow and stereo disparity signals Florian Raudies Schuyler Eldridge Ajay Joshi Massimiliano Versace Received: 23 December 2013 / Accepted: 29 March 2014 / Published online: 20 August 2014 Ó ISAROB 2014 Abstract Navigating in a complex world is challenging in that the rich, real environment provides a very large number of sensory states that can immediately precede a collision. Biological organisms such as rodents are able to solve this problem, effortlessly navigating in closed spaces by encoding in neural representations distance toward walls or obstacles for a given direction. This paper presents a method that can be used by virtual (simulated) or robotic agents, which uses states similar to neural representations to learn collision avoidance. Unlike other approaches, our reinforcement learning approach uses a small number of states defined by discretized distances along three constant directions. These distances are estimated either from optic flow or binocular stereo information. Parameterized tem- plates for optic flow or disparity information are compared against the input flow or input disparity to estimate these distances. Simulations in a virtual environment show learning of collision avoidance. Our results show that learning with only stereo information is superior to learn- ing with only optic flow information. Our work motivates the usage of abstract state descriptions for the learning of visual navigation. Future work will focus on the fusion of optic flow and stereo information, and transferring these models to robotic platforms. Keywords Learning of navigation Optic flow Stereo disparity Virtual world 1 Introduction Estimating the distance from perceived objects in the environment, either target or obstacles, and the ability to learn their position and strategies to approach or avoid them, are crucial skills for humans, animals, and robots alike. Recently, memory structures that encode geomet- rical constraints of the environment have been discovered in rats [17, 29]. Cells encode the distance of walls for allocentric direction in their patterns of spatial firing, fostering the idea of using a distances as states of a reinforcement learner. In general, various cues can be used to infer distance information, to detect the ground, or to determine traver- siblity [14]. Relative depth cues learned from monocular images have been used to learn obstacle avoidance [19]. Other work focused on the unique encoding and fast recall of views from omnidirectional cameras for visual naviga- tion and self-localization [2, 11]. Another approach is the extraction of image primitives, e.g. edges or Gabor filter responses, to encode views [15, 22, 30]. Yue et al. [35] directly take the video input to simulate the lobula giant movement detector neuron of locusts trained to signal collisions. Martinez-Marin and Duckett [18] use a color- based segmentation to learn a docking-task based on the orientation of the robot, the orientation of the table, and the distance of the object, which is placed at the edge of the table. Gaskett et al. [12] infer drivable space by a cross- correlation with a pre-loaded picture of the carpet texture, which is linked to actions of the robot by reinforcement learning. F. Raudies (&) M. Versace Center for Computational Neuroscience and Neural Technology (CompNet) at Boston University, 677 Beacon Street, Boston, MA 02215, USA e-mail: fraudies@bu.edu S. Eldridge A. Joshi Department of Electrical and Computer Engineering at Boston University, 8 St. Mary Street, Boston, MA 02215, USA 123 Artif Life Robotics (2014) 19:157–169 DOI 10.1007/s10015-014-0153-1