Int J Comput Vis manuscript No. (will be inserted by the editor) RSLAM: A System for Large-Scale Mapping in Constant-Time using Stereo Christopher Mei · Gabe Sibley · Mark Cummins · Paul Newman · Ian Reid Received: date / Accepted: date Abstract Large scale exploration of the environment requires a constant time estimation engine. Bundle ad- justment or pose relaxation do not fulfil these require- ments as the number of parameters to solve grows with the size of the environment. We describe a relative si- multaneous localisation and mapping system (RSLAM) for the constant-time estimation of structure and mo- tion using a binocular stereo camera system as the sole sensor. Achieving robustness in the presence of difficult and changing lighting conditions and rapid motion re- quires careful engineering of the visual processing, and we describe a number of innovations which we show lead to high accuracy and robustness. In order to achieve real-time performance without placing severe limits on the size of the map that can be built, we use a topo- metric representation in terms of a sequence of relative locations. When combined with fast and reliable loop- closing, we mitigate the drift to obtain highly accurate global position estimates without any global minimisa- tion. We discuss some of the issues that arise from us- ing a relative representation, and evaluate our system on long sequences processed at a constant 30-45 Hz, ob- taining precisions down to a few meters over distances of a few kilometres. Keywords SLAM · Stereo · Tracking · Loop Closing · SIFT 1 Introduction Building autonomous platforms using vision sensors has encouraged many developments in low-level image pro- C. Mei, G. Sibley, M. Cummins, P. Newman and I. Reid Robotics Research Group, Department of Engineering Science, Parks Road, Oxford, OX1 3PJ E-mail: {cmei,gsibley,mjc,pnewman,ian}@robots.ox.ac.uk cessing and in estimation techniques. Recent improve- ments have lead to real-time solutions on standard hard- ware. However these often rely on global solutions that do not scale with the size of environment. Further- more, few systems integrate loop closure or a relocali- sation mechanism that is essential for working in non- controlled environments where tracking assumptions are often violated. In this work, we investigate how to effi- ciently combine relevant approaches to obtain a vision- based solution that provides high frame rate, constant- time exploration, resilience to motion blur and a relo- cation and loop closure mechanism. Vision-based systems can be classified between monoc- ular and stereo solutions. The presence of a single cam- era on an increasing amount of consumer goods (mo- bile phones, personal digital assistants, laptops, etc.) is a strong motivation for research in monocular vision. However the use of a monocular system can lead to failure modes due to non-observability (e.g. with pure rotation), problems with scale propagation and requires extra computation to provide depth estimates. To avoid these issues, the current system uses a stereo pair which paradoxically reduces the computation as low-level pro- cessing can take advantage of scale (Section 6) and rely less on expensive joint depth and pose estimation. The relative SLAM system presented in this paper combines a world representation enabling loop closure in real-time (Section 3) with carefully engineered low- level image processing adapted to stereo image pairs. The novelty in the world representation comes from the continuous relative formulation that avoids map merg- ing and the transfer of statistics between sub-maps. We describe a scheme for relative bundle adjustment (RBA) within this framework that leads to improved precision in Section G. In this article, we demonstrate the integration of three key components: (i) a represen-