David C. Wyld et al. (Eds) : CST, ITCS, JSE, SIP, ARIA, DMS - 2014
pp. 339–351, 2014. © CS & IT-CSCP 2014 DOI : 10.5121/csit.2014.4131
Ludovico Russo, Stefano Rosa, Basilio Bona
1
and Matteo Matteucci
2
1
Dipartimento di Automatica e Informatica, Politecnico di Torino, Torino, Italy
{ludovico.russo, stefano.rosa, basilio.bona}@polito.it
2
Dipartimento di Elettronica, Informatica e Bioingegneria,
Politecnico di Milano, Milano, Italy
matteo.matteucci@polimi.it
ABSTRACT
Computer vision approaches are increasingly used in mobile robotic systems, since they allow
to obtain a very good representation of the environment by using low-power and cheap sensors.
In particular it has been shown that they can compete with standard solutions based on laser
range scanners when dealing with the problem of simultaneous localization and mapping
(SLAM), where the robot has to explore an unknown environment while building a map of it and
localizing in the same map. We present a package for simultaneous localization and mapping in
ROS (Robot Operating System) using a monocular camera sensor only. Experimental results in
real scenarios as well as on standard datasets show that the algorithm is able to track the
trajectory of the robot and build a consistent map of small environments, while running in near
real-time on a standard PC.
KEYWORDS
SLAM, Mono-SLAM, Mapping, Mobile robotics
1. INTRODUCTION
In several application scenarios mobile robots are deployed in an unknown environment and they
are required to build a model (map) of the surroundings, as well as localizing therein.
Simultaneous localization and mapping (SLAM) applications now exist in a variety of domains
including indoor, outdoor, aerial and underwater and using different types of sensors such as laser
range finders, sonars and cameras [4]. Although, the majority of those approaches still rely on
classical laser range finders, the use of vision sensors provides several unique advantages: they
are usually inexpensive, low-power, compact and are able to capture higher level infor- mation
compared to classical distance sensors. Moreover, human-like visual sensing and the potential
availability of higher level semantics in an image make them well suited for augmented reality
applications.
Visual SLAM approaches are usually divided in two main branches: smoothing approaches based
on bundle adjustment, and filtering approaches based on probabilistic filters. The latter are
divided in three main classes: dense, sparse and semantic approaches. Dense approaches ([17],
[14], [22]) are able to build dense maps of the environment, which make the algorithms more
robust but at the same time heavy in terms of computational requirements; indeed, most of these