Nonlinear Mean Shift for Robust Pose Estimation
Raghav Subbarao
†‡
Yakup Genc
‡
Peter Meer
†
†
ECE Department
‡
Real-time Vision and Modeling Department
Rutgers University Siemens Corporate Research
Piscataway, NJ 08854 Princeton, NJ 08540
Abstract
We propose a new robust estimator for camera pose esti-
mation based on a recently developed nonlinear mean shift
algorithm. This allows us to treat pose estimation as a clus-
tering problem in the presence of outliers. We compare our
method to RANSAC, which is the standard robust estima-
tor for computer vision problems. We also show that under
fairly general assumptions our method is provably better
than RANSAC. Synthetic and real examples to support our
claims are provided.
1. Introduction
Real time estimation of camera pose is an important
problem in computer vision. Pose estimation along with
scene structure estimation is known as the Structure-From-
Motion (SFM) problem which is the central goal of vision.
It is widely accepted that once good estimates of the struc-
ture and motion are known, they can be improved using of-
fline methods like bundle adjustment [19]. However, to get
a starting point, a system needs to account for both noise
and gross errors which do not satisfy the geometric con-
straints being enforced. Such errors are known as outliers.
Pose estimation is also a part of other applications such
as augmented reality (AR). For AR only the pose of the
camera is needed, although some structure may also be esti-
mated. The pose is required in real time and offline methods
such as bundle adjustment are not applicable here.
Random Sample Consensus (RANSAC) and its varia-
tions, which follow a hypothesise-and-test procedure, are
the standard ways of handling outliers in SFM. In this paper
we propose a new robust estimator for camera pose estima-
tion. The estimator is based on the nonlinear mean shift al-
gorithm of [15, 20] applied to the Special Euclidean Group
which is the set of all rigid body motions in 3D and is equiv-
alent to the set of all camera poses. We show theoretically
and experimentally that our method requires fewer hypothe-
ses than any hypothesise-and-test algorithm for the same
level of performance.
We discuss some of the previous work related to our ap-
proach in Section 2. In Section 3 we introduce the nonlinear
mean shift algorithm. In Section 4 we develop a robust pose
estimator based on this algorithm and outline a proof of why
we expect the mean shift based estimator to be better than
RANSAC. Finally, in Section 5 we present the results of
experiments on synthetic and real data sets.
2. Previous Work
Classical methods reconstruct the scene using correspon-
dences across images and estimating the epipolar geome-
try between pairs of frames or the trifocal tensor for three
frames. These reconstructions are then stitched together
into a single frame [14]. The Euclidean equivalent of this
is the relative pose estimation problem given image corre-
spondences between two images [8]. Alternatively, the mo-
tion and structure can be estimated in a single coordinate
frame [12]. Such methods require absolute camera pose
estimation based on correspondences between 3D world
points and 2D image points [1, 6].
An important aspect of these algorithms is that whenever
any geometrical constraint is being enforced, there will be
outliers which do not satisfy the constraint. These outliers
occur due to errors in lower level modules such as the image
feature tracker. When estimating the motion and structure it
is necessary to detect and remove these outliers.
The standard way of handling outliers in computer vi-
sion is the RANSAC algorithm [4]. In RANSAC, parameter
hypotheses are generated by randomly choosing a minimal
number of elements required to generate a hypothesis. The
hypotheses are scored based on their likelihood to have gen-
erated the observed data and the best hypothesis is retained.
Based on the noise model assumed, different scoring func-
tion have been proposed to develop variations of RANSAC
[17, 18].
Another important contribution has been the develop-
ment of preemptive forms of RANSAC [2, 10] which al-
low RANSAC to be used in real-time SFM systems. In
such methods, all the hypotheses’ are not scored com-
pletely. Some hypotheses are preemptively dropped. Un-
like RANSAC where a single hypothesis is generated and
scored while only retaining the most likely hypothesis, pre-
emptive RANSAC [10] proceeds by generating all the hy-
potheses at the beginning. The likelihood of the hypotheses
IEEE Workshop on Applications of Computer Vision (WACV'07)
0-7695-2794-9/07 $20.00 © 2007