David C. Wyld et al. (Eds) : CST, ITCS, JSE, SIP, ARIA, DMS - 2014 pp. 339–351, 2014. © CS & IT-CSCP 2014 DOI : 10.5121/csit.2014.4131        Ludovico Russo, Stefano Rosa, Basilio Bona 1 and Matteo Matteucci 2 1 Dipartimento di Automatica e Informatica, Politecnico di Torino, Torino, Italy {ludovico.russo, stefano.rosa, basilio.bona}@polito.it 2 Dipartimento di Elettronica, Informatica e Bioingegneria, Politecnico di Milano, Milano, Italy matteo.matteucci@polimi.it ABSTRACT Computer vision approaches are increasingly used in mobile robotic systems, since they allow to obtain a very good representation of the environment by using low-power and cheap sensors. In particular it has been shown that they can compete with standard solutions based on laser range scanners when dealing with the problem of simultaneous localization and mapping (SLAM), where the robot has to explore an unknown environment while building a map of it and localizing in the same map. We present a package for simultaneous localization and mapping in ROS (Robot Operating System) using a monocular camera sensor only. Experimental results in real scenarios as well as on standard datasets show that the algorithm is able to track the trajectory of the robot and build a consistent map of small environments, while running in near real-time on a standard PC. KEYWORDS SLAM, Mono-SLAM, Mapping, Mobile robotics 1. INTRODUCTION In several application scenarios mobile robots are deployed in an unknown environment and they are required to build a model (map) of the surroundings, as well as localizing therein. Simultaneous localization and mapping (SLAM) applications now exist in a variety of domains including indoor, outdoor, aerial and underwater and using different types of sensors such as laser range finders, sonars and cameras [4]. Although, the majority of those approaches still rely on classical laser range finders, the use of vision sensors provides several unique advantages: they are usually inexpensive, low-power, compact and are able to capture higher level infor- mation compared to classical distance sensors. Moreover, human-like visual sensing and the potential availability of higher level semantics in an image make them well suited for augmented reality applications. Visual SLAM approaches are usually divided in two main branches: smoothing approaches based on bundle adjustment, and filtering approaches based on probabilistic filters. The latter are divided in three main classes: dense, sparse and semantic approaches. Dense approaches ([17], [14], [22]) are able to build dense maps of the environment, which make the algorithms more robust but at the same time heavy in terms of computational requirements; indeed, most of these