Distributed multi-camera visual mapping using topological maps of planar regions Eduardo Montijano à , Carlos Sagues Departamento de Informa ´tica e Ingenierı ´a de SistemasInstituto de Investigacio ´n en Ingenierı ´a de Arago ´n (I3A), Universidad de Zaragoza, Spain article info Article history: Received 19 April 2010 Received in revised form 17 December 2010 Accepted 25 December 2010 Available online 8 January 2011 Keywords: Computer vision Distributed systems Mapping abstract This paper presents a multi-agent solution for cooperative visual mapping using planar regions. Each agent is assumed to be equipped with a conventional camera and has limited communication capabilities. Our approach starts building topological maps from independent image sequences where natural landmarks extracted from conventional images are grouped to create a graph of planes. With this approach the features observed in several images belonging to the same planar region are stored only once, reducing the size of the individual maps. In a distributed scenario this is very important because smaller maps can be transmitted faster, which makes our approach better suited for cooperative mapping. The later fusion of the individual maps is obtained via distributed consensus without any initial information about the relations between the different maps. Experiments with real images in complex scenarios show the good performance of our proposal. & 2011 Elsevier Ltd. All rights reserved. 1. Introduction Advances in communication technologies and vision systems have made feasible the idea of sets of cameras performing different perception tasks such as surveillance, tracking or map- ping, in a cooperative way. In this paper we focus on the problem of map building using several cameras which communicate their perceived information to others. A set of cameras is able to map the environment faster than a single one would do. However, this adds new challenges that must be solved. The problem of mapping the environment considering a single camera has been deeply studied, specially in the robotics commu- nity. The importance of creating and maintaining a map for localization and navigation tasks is obvious and a lot of effort has been made in this research line. A common approach is to simultaneously localize the camera and map the environment (SLAM) [1,2]. In this approaches the map is usually represented by a set of 3D features, whose position is updated every time they are observed in a new image. To make this process more robust, view-based maps [3] introduce geometric constraints between pairs of images in the SLAM algorithm. Computing the structure of the scene, usually defined with the positions of the features and the cameras, makes the errors and drift grow with the size of the map. Our approach overcomes this limitation because we consider planar regions and no explicit metric information for the global map. A plane is defined as a set of features that belong to the same planar region in the scene. The errors in different planar regions are uncorrelated because the extraction of each one is independent of the others. Topological maps [4] where no metric information is computed are another widely used approach to organize the visual informa- tion. Topological visual maps can be built from conventional [5,6] or omnidirectional [7,8] images. The whole image can be stored but usually only the extracted features are saved. Most recent and successful works use visual words [9] to represent the scene in a more compact way. Although topological maps based on images give good results, the space required to manage these maps is considerably large. If we store all the features, many of them will be seen in several images, so that the map will take up a lot of repeated data for the same features. By using planar regions, features are stored only once independently of the number of images in which they are observed. Moreover, the complexity of the maps is also reduced. Graphs made from raw images are usually dense because of the number of connections among close images whereas with the proposed maps the number of connections between planes is considerably smaller. Semantic meaning [10] in the maps is essen- tial to human–robot interaction. Planes also provide a good semantic information meaningful for humans, making this choice more adequate than just features or raw images. In addition to the previous advantages, it is well known that the structure estimation is improved in terms of accuracy and stability when considering the scene represented by planes [11]. Moreover, there are several works in the literature that assume Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/pr Pattern Recognition 0031-3203/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2010.12.022 à Corresponding author. E-mail addresses: emonti@unizar.es (E. Montijano), csagues@unizar.es (C. Sagues). Pattern Recognition 44 (2011) 1528–1539