Autonomous vision-based robotic exploration and mapping using hybrid maps and particle filters Robert Sim a, * , James J. Little b a Braintech, Inc., Consumer Robotics Lab, 102-930 West 1st Street, North Vancouver, BC, Canada V7P 3N4 b Department of Computer Science, University of British Columbia, 2366 Main Mall, Vancouver, BC, Canada V6T 1Z4 article info Article history: Received 2 April 2007 Received in revised form 14 January 2008 Accepted 10 April 2008 Keywords: Robotics SLAM Exploration Mapping RBPF Stereo vision Hybrid maps SIFT abstract This paper addresses the problem of exploring and mapping an unknown environment using a robot equipped with a stereo vision sensor. The main contribution of our work is a fully automatic mapping system that operates without the use of active range sensors (such as laser or sonic transducers), can operate on-line and can consistently produce accurate maps of large-scale environments. Our approach implements a Rao-Blackwellised particle filter (RBPF) to solve the simultaneous localization and mapping problem and uses efficient data structures for real-time data association, mapping, and spatial reasoning. We employ a hybrid map representation that infers 3D point landmarks from image features to achieve precise localization, coupled with occupancy grids for safe navigation. We demonstrate two exploration approaches, one based on a greedy strategy and one based on an iteratively deepening strategy. This paper describes our framework and implementation, and presents our exploration method, and experi- mental results illustrating the functionality of the system. Ó 2008 Elsevier B.V. All rights reserved. 1. Introduction For a robot to operate autonomously in its environment, it first requires an accurate representation that facilitates localization and navigation. While the problem of automatically constructing such a representation is largely solved for robots equipped with active range-finding devices (and generally operating in planar worlds, e.g., [1]), for a variety of reasons the task remains challenging for robots equipped only with vision sensors. This paper presents a solution to the autonomous exploration and mapping problem using a robot equipped only with a stereo camera and an odometry sensor. In particular, we demonstrate a consistent, convergent simultaneous localization and mapping (SLAM) solution and the generation of a map representation that facilitates both accurate localization and collision-free navigation. Furthermore, the map- ping is accomplished on-line under fully autonomous planning and control. In this light our work is unique among the extant vi- sion-based mapping frameworks. The last decade of robotics research has generated a multitude of approaches to the SLAM problem. Central to this problem is the probabilistic estimation of a map conditioned on a robot’s noisy actions and observations. In general, it has been demon- strated that with a highly accurate sensor and some straightfor- ward assumptions about the world (e.g., a planar pose space), a robot can successfully map a large indoor environment in real- time. The main difficulty with these approaches is that they rely on active (energy-emitting) laser range-finding sensors, and they assume that all of the important obstacles in the world lie in the plane of the sensor. One can easily demonstrate that many, per- haps most, environments contain obstacles violating this assump- tion. Furthermore, the data returned by a laser tends to be impoverished in that substantial travel may be required (integrat- ing measurements along the way) in order to infer a robot’s posi- tion. Finally, there are many potential scenaria where an active sensor is undesirable. The primary alternative to active range sensing is the passive approach afforded by stereo imagery. These approaches benefit in that the information contained in a single image can often pro- vide substantial discriminative power for localization, and that a stereo sensor can provide a 3D (or 2.5D) representation of potential obstacles. The main drawback of stereo sensing and image-based sensing in general is that noise plays a substantial role in diluting a robot’s inferential power, particularly as it applies to geometric reasoning. As a result, the application of successful range-based techniques has proven to be a challenge, particularly as it applies to occupancy grid-based SLAM. This fact is evident in that of all of the vision-based SLAM solutions to date, the vast majority have computed only landmark (feature)-based representations and 0262-8856/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.imavis.2008.04.003 * Corresponding author. Tel.: +1 604 834 5446. E-mail addresses: rsim@braintech.com (R. Sim), little@cs.ubc.ca (J.J. Little). Image and Vision Computing 27 (2009) 167–177 Contents lists available at ScienceDirect Image and Vision Computing journal homepage: www.elsevier.com/locate/imavis