334 IEEE TRANSACTIONS ON ROBOTICS, VOL. 22, NO. 2, APRIL 2006
Landmark Selection for Vision-Based Navigation
Pablo Sala, Student Member, IEEE, Robert Sim, Member, IEEE, Ali Shokoufandeh, Member, IEEE, and
Sven Dickinson, Member, IEEE
Abstract—Recent work in the object recognition community has
yielded a class of interest-point-based features that are stable under
significant changes in scale, viewpoint, and illumination, making
them ideally suited to landmark-based navigation. Although many
such features may be visible in a given view of the robot’s environ-
ment, only a few such features are necessary to estimate the robot’s
position and orientation. In this paper, we address the problem of
automatically selecting, from the entire set of features visible in
the robot’s environment, the minimum (optimal) set by which the
robot can navigate its environment. Specifically, we decompose the
world into a small number of maximally sized regions, such that
at each position in a given region, the same small set of features
is visible. We introduce a novel graph theoretic formulation of the
problem, and prove that it is NP-complete. Next, we introduce a
number of approximation algorithms and evaluate them on both
synthetic and real data. Finally, we use the decompositions from
the real image data to measure the localization performance versus
the undecomposed map.
Index Terms—Feature selection, localization, machine vision,
mapping, mobile robots.
I. INTRODUCTION
I
N THE domain of exemplar-based (as opposed to generic)
object recognition, the computer vision community has
recently adopted a class of interest-point-based features, e.g.,
[1]–[4]. Such features typically encode a description of image
appearance in the neighborhood of an interest point, such as
a detected corner or scale-space maximum. The appeal of
these features over their appearance-based predecessors is their
invariance to changes in illumination, scale, image transla-
tion, and rotation, and minor changes in viewpoint (rotation
in depth). These properties make them ideally suited to the
problem of landmark-based navigation. If we can define a set
of invariant features that uniquely defines a particular location
in the environment, these features can, in turn, define a visual
landmark.
To use these features, we could, for example, adopt a local-
ization approach proposed by Basri and Rivlin [5], and Wilkes
Manuscript received November 8, 2004; revised April 18, 2005. This paper
was recommended for publication by Associate Editor D. Sun and Editor
S. Hutchinson upon evaluation of the reviewers’ comments. This work was
supported in part by NSERC, in part by PREA, in part by CITO, in part by MD
Robotics, and in part by ONR. This paper was presented in part at the IEEE
International Conference on Intelligent Robots and Systems, Sendai, Japan,
September 2004.
P. Sala and S. Dickinson are with the Department of Computer Sci-
ence, University of Toronto, Toronto, ON M5S 3G4, Canada (e-mail:
psala@cs.toronto.edu; sven@cs.toronto.edu).
R. Sim is with the Department of Computer Science, University of British
Columbia, Vancouver, BC V6T 1Z4, Canada (e-mail: simra@cs.ubc.ca).
A. Shokoufandeh is with the Department of Computer Science, College
of Engineering, Drexel University, Philadelphia, PA 19104 USA (e-mail:
ashokouf@cs.drexel.edu).
Digital Object Identifier 10.1109/TRO.2005.861480
et al. [6], based on the linear combination of views (LC) tech-
nique. During a training phase, the robot is manually “shown”
two views of each landmark in the environment, by which the
robot is to later navigate. These views, along with the positions
at which they were acquired, form a database of landmark views.
At run-time, the robot takes an image of the environment and
attempts to match the visible features to the various landmark
views it has stored in its database. Given a match to some land-
mark view, the robot can compute its position and orientation in
the world.
There are two major challenges with this approach. First,
from any given viewpoint, there may be hundreds or even
thousands of such features. The union of all pairs of landmark
views may therefore yield an intractable number of distin-
guishable features that must be indexed in order to determine
which landmark the robot may be viewing.
1
Fortunately, only
a small number of features are required (in each model view)
to compute the robot’s pose. Therefore, of the hundreds of
features visible in a model view, which small subset should we
keep?
The second challenge is to automate this process, and let the
robot automatically decide on an optimal set of visual landmarks
for navigation. What constitutes a good landmark? A landmark
should be both distinguishable from other landmarks (a single
floor tile, for example, would constitute a bad landmark since it
is repeated elsewhere on the floor) and widely visible (a land-
mark visible only from a single location will rarely be encoun-
tered and, if so, will not be persistent). Therefore, our goal can
be formulated as partitioning the world into a minimum number
of maximally sized contiguous regions, such that the same set
of features is visible at all points within a given region.
There is an important connection between these two chal-
lenges. Specifically, given a region, inside of which all points see
the same set of features (our second challenge), what happens
when we reduce the set of features that must be visible at each
point (first challenge)? Since this represents a weaker constraint
on the region, the size of the region can only increase, yielding
a smaller number of larger regions covering the environment.
As mentioned earlier, there is a lower bound on the number of
features that can define a region, based on the pose-estimation
algorithm and the degree to which we want to overconstrain its
solution.
Combining these two challenges, we arrive at the main
problem addressed by this paper: from a set of views acquired
1
Worst-case indexing complexity would occur during the kidnapped local-
ization task, in which the robot has no prior knowledge of where it is in the
world. Under normal circumstances, given the currently viewed landmark and
the current heading, the space of landmark views that must be searched can be
constrained. Still, even for a small set of landmark views, this may yield a large
search space of features.
1552-3098/$20.00 © 2006 IEEE