1 A geometrically constrained image similarity measure for visual mapping, localization and navigation Ben Kr¨ ose Olaf Booij Zoran Zivkovic Intelligent Systems Laboratory, University of Amsterdam, Amsterdam, the Netherlands (krose,obooij,zivkovic)@science.uva.nl Abstract— This paper presents an measure for image similarity based on local feature descriptions and geometric constraints. We show that on the basis of this similarity an appearance graph representation of the environment of a mobile robot can be made. This graph can be used for representing semantic information about the space, and can be used for visual navigation. The image similarity measure is robust for occlusions by people in the neighbourhood of the robot. Index Terms— Visual mapbuilding, localization, navigation I. I NTRODUCTION An internal representation of the environment is needed for optimal mobile robot navigation. Traditionally such a model is represented as a geometric model indicating admissible and non-admissible areas. The robot has to know its location within such a model and in most of the times has to estimate the parameters of the models simultaneously (SLAM). Now cameras and processing power are becoming cheaper, visual information is used more often in environment mod- eling. For example, visual features are used to solve the loop closing problem in geometric SLAM. A step further are approaches which model the environment only in appearances, in contrast to explicit geometric representations of space. In this paper we present our recent work on appearance modeling of the environment. On the basis of a set of omnidi- rectional camera images an ’appearance graph’ is constructed. This graph can be used for navigation and for a categorization. A prerequisite for making the graph is a good similarity mea- sure between images. The paper first present a brief overview on visual perception and space models. Then the work on appearance modeling in robotics is summarized. Section IV presents the graph based model. The similarity measure and the applications of the graph are presented in sections V,VI and VII. The robustness for visual occlusions is presented in section VIII. II. SPATIAL REPRESENTATIONS AND VISUAL INFORMATION Work on spatial representations has been carried out in vari- ous fields. From the field of behavioural psychology, the early studies of Tolman [24] using rats in various mazes, showed that rats could learn a ’cognitive map’ and reason with that representation. Also from the field of neuroscience a cognitive map theory was presented by O’Keefe and Nadel in 1978 [2]. The theory focussed on hippocampal functioning and suggests Fig. 1. Shepard & Metzlers Mental rotation task. Subjects were shown pairs of drawings of three-dimensional objects and asked whether the members of a pair were identical. The task can be solved for physical objects by rotating one of them until they can be viewed from the same perspective, but in this case the subjects had to perform the rotation ”mentally”. that this brain structure is the core of an extensive neural system subserving the representation and use of information about the spatial environment. The authors describe that visual cues play an important role in map learning. An intriguing debate took place the end of the 1980’s, when Kosslyn [9] presented his theory on mental imagery. In his research he studied to which extend images serve as data structures for human memory. As a part of that work spatial representations were considered. Experiments carried out earlier by Shephard (see figure 1) showed that a in order to judge whether two observations were coming from a same object, the subjects ’mentally rotated’ one of shapes and compared it to the other: the matching was done in the image domain instead of in the 3D shape domain . Also in other fields, for example engineering, studies have been carried out on the representation of space. From the field of city design, ’cognitive maps’ describe hoe people perceive and understand the environment [13]. Lynch’s stud-