Improving the Selection and Detection of Visual Landmarks Through Object Tracking P. Espinace and A. Soto Department of Computer Science, Pontificia Universidad Catlica de Chile Casilla 306, Santiago 22, Chile [pespinac,asoto]@ing.puc.cl Abstract The unsupervised selection and posterior recognition of visual landmarks is a highly valuable perceptual capa- bility for a mobile robot. Recently, in [6], we propose a system that aims to achieve this capability by combin- ing a bottom-up data driven approach with top-down feed- back provided by high level semantic representations. The bottom-up approach is based on three main mechanisms: visual attention, area segmentation, and landmark charac- terization. The top-down feedback is based on two infor- mation sources: i) An estimation of the robot position that reduces the searching scope for potential matches with pre- viously selected landmarks, ii) A set of weights that, accord- ing to the results of previous recognitions, controls the influ- ence of different segmentation algorithms in the recognition of each landmark. In this paper we explore the benefits of extending our previous work by including a visual track- ing step for each of the selected landmarks. Our intuition is that the inclusion of a tracking step can help to improve the model of each landmark by associating and selecting information from its most significant views. Furthermore, it can also help to avoid problems related to the selection of spurious landmarks. Our results confirm these intuitions by showing that the inclusion of the tracking step produces a significant increase in the recall rate for landmark recogni- tion. 1. Introduction Autonomous point to point navigation is a key require- ment for most practical applications of mobile robots in nat- ural environments. In this context, the problems of auto- matic construction of maps of the environment and accurate estimation of the position of the robot within a map, tasks known as mapping and localization, have been a long time aspiration for the Robotics community. Particularly, map- ping and localization are highly relevant issues for the case of indoor environments, where globally accurate position- ing systems, such as GPS, are not available. At present, the state of the art solutions to indoor map- ping and localization problems are mainly based on using 2D laser range finders and metric map representations, such as evidence-grids [5]. Although, this type of approaches has shown a high degree of success when operating in real time in natural environments [18], they still suffer from some limitations. For example, the usual structural symmetries of indoor building produce data association problems that are hard to solve with the 2D view of a laser range finder. Furthermore, problems such as modifications of the envi- ronment due to changes in the position of furniture, uncer- tainties due to the state of doors, or partial occlusions due to people walking around, also diminish the robustness of solutions based on 2D laser range finders. Recently, advances in the area of computer vision [21] [11] have increased the interest in including vision as one of the main sensor modalities to support the perceptual needs of autonomous navigation. In this respect, the robustness and flexibility exhibited by the navigation systems of most seeing beings is a clear proof of the advantages of counting with a suitable visual perception system. In the case of mobile robots, the unsupervised selection and posterior recognition of relevant visual landmarks is a highly valuable perceptual capability to successfully deal with the complexity of an unstructured natural environment. In this respect, in a previous approach [6], we developed an unsupervised method for the automatic selection and subse- quent recognition of suitable visual landmarks using images acquired by a mobile robot. To achieve this goal, we com- bine bottom-up visual features based on color, intensity, and depth cues, with top-down feedback given by spatial rela- tions and memories of the most successful predicting fea- tures of previously recognized landmarks. In this way, the resulting system is able to select interesting, meaningful, and useful landmarks that can be used by a mobile robot to achieve indoor autonomous navigation. The bottom-up approach for the selection of candidate 1