334 IEEE TRANSACTIONS ON ROBOTICS, VOL. 22, NO. 2, APRIL 2006 Landmark Selection for Vision-Based Navigation Pablo Sala, Student Member, IEEE, Robert Sim, Member, IEEE, Ali Shokoufandeh, Member, IEEE, and Sven Dickinson, Member, IEEE Abstract—Recent work in the object recognition community has yielded a class of interest-point-based features that are stable under signiﬁcant changes in scale, viewpoint, and illumination, making them ideally suited to landmark-based navigation. Although many such features may be visible in a given view of the robot’s environ- ment, only a few such features are necessary to estimate the robot’s position and orientation. In this paper, we address the problem of automatically selecting, from the entire set of features visible in the robot’s environment, the minimum (optimal) set by which the robot can navigate its environment. Speciﬁcally, we decompose the world into a small number of maximally sized regions, such that at each position in a given region, the same small set of features is visible. We introduce a novel graph theoretic formulation of the problem, and prove that it is NP-complete. Next, we introduce a number of approximation algorithms and evaluate them on both synthetic and real data. Finally, we use the decompositions from the real image data to measure the localization performance versus the undecomposed map. Index Terms—Feature selection, localization, machine vision, mapping, mobile robots. I. INTRODUCTION I N THE domain of exemplar-based (as opposed to generic) object recognition, the computer vision community has recently adopted a class of interest-point-based features, e.g., [1]–[4]. Such features typically encode a description of image appearance in the neighborhood of an interest point, such as a detected corner or scale-space maximum. The appeal of these features over their appearance-based predecessors is their invariance to changes in illumination, scale, image transla- tion, and rotation, and minor changes in viewpoint (rotation in depth). These properties make them ideally suited to the problem of landmark-based navigation. If we can deﬁne a set of invariant features that uniquely deﬁnes a particular location in the environment, these features can, in turn, deﬁne a visual landmark. To use these features, we could, for example, adopt a local- ization approach proposed by Basri and Rivlin [5], and Wilkes Manuscript received November 8, 2004; revised April 18, 2005. This paper was recommended for publication by Associate Editor D. Sun and Editor S. Hutchinson upon evaluation of the reviewers’ comments. This work was supported in part by NSERC, in part by PREA, in part by CITO, in part by MD Robotics, and in part by ONR. This paper was presented in part at the IEEE International Conference on Intelligent Robots and Systems, Sendai, Japan, September 2004. P. Sala and S. Dickinson are with the Department of Computer Sci- ence, University of Toronto, Toronto, ON M5S 3G4, Canada (e-mail: psala@cs.toronto.edu; sven@cs.toronto.edu). R. Sim is with the Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4, Canada (e-mail: simra@cs.ubc.ca). A. Shokoufandeh is with the Department of Computer Science, College of Engineering, Drexel University, Philadelphia, PA 19104 USA (e-mail: ashokouf@cs.drexel.edu). Digital Object Identiﬁer 10.1109/TRO.2005.861480 et al. [6], based on the linear combination of views (LC) tech- nique. During a training phase, the robot is manually “shown” two views of each landmark in the environment, by which the robot is to later navigate. These views, along with the positions at which they were acquired, form a database of landmark views. At run-time, the robot takes an image of the environment and attempts to match the visible features to the various landmark views it has stored in its database. Given a match to some land- mark view, the robot can compute its position and orientation in the world. There are two major challenges with this approach. First, from any given viewpoint, there may be hundreds or even thousands of such features. The union of all pairs of landmark views may therefore yield an intractable number of distin- guishable features that must be indexed in order to determine which landmark the robot may be viewing. 1 Fortunately, only a small number of features are required (in each model view) to compute the robot’s pose. Therefore, of the hundreds of features visible in a model view, which small subset should we keep? The second challenge is to automate this process, and let the robot automatically decide on an optimal set of visual landmarks for navigation. What constitutes a good landmark? A landmark should be both distinguishable from other landmarks (a single ﬂoor tile, for example, would constitute a bad landmark since it is repeated elsewhere on the ﬂoor) and widely visible (a land- mark visible only from a single location will rarely be encoun- tered and, if so, will not be persistent). Therefore, our goal can be formulated as partitioning the world into a minimum number of maximally sized contiguous regions, such that the same set of features is visible at all points within a given region. There is an important connection between these two chal- lenges. Speciﬁcally, given a region, inside of which all points see the same set of features (our second challenge), what happens when we reduce the set of features that must be visible at each point (ﬁrst challenge)? Since this represents a weaker constraint on the region, the size of the region can only increase, yielding a smaller number of larger regions covering the environment. As mentioned earlier, there is a lower bound on the number of features that can deﬁne a region, based on the pose-estimation algorithm and the degree to which we want to overconstrain its solution. Combining these two challenges, we arrive at the main problem addressed by this paper: from a set of views acquired 1 Worst-case indexing complexity would occur during the kidnapped local- ization task, in which the robot has no prior knowledge of where it is in the world. Under normal circumstances, given the currently viewed landmark and the current heading, the space of landmark views that must be searched can be constrained. Still, even for a small set of landmark views, this may yield a large search space of features. 1552-3098/$20.00 © 2006 IEEE