RS-SLAM: RANSAC Sampling for Visual FastSLAM
Gim Hee Lee, Friedrich Fraundorfer, and Marc Pollefeys
Computer Vision and Geometry Laboratory, Department of Computer Science, ETH Z¨ urich, Switzerland
glee@student.ethz.ch, fraundorfer@inf.ethz.ch, marc.pollefeys@inf.ethz.ch
Abstract—In this paper, we present our RS-SLAM algorithm
for monocular camera where the proposal distribution is
derived from the 5-point RANSAC algorithm and image feature
measurement uncertainties instead of using the easily violated
constant velocity model. We propose to do another RANSAC
sampling within all the inliers that have the best RANSAC
score to check for inlier misclassifications in the original cor-
respondences and use all the hypotheses generated from these
consensus sets in the proposal distribution. This is to mitigate
data association errors (inlier misclassifications) caused by the
observation that the consensus set from RANSAC that yields the
highest score might not, in practice, contain all the true inliers
due to noise on the feature measurements. Hypotheses which
are less probable will eventually be eliminated in the particle
filter resampling process. We also show in this paper that our
monocular approach can be easily extended for stereo camera.
Experimental results validate the potential of our approach.
I. I NTRODUCTION
In the recent years, we have seen much success in ap-
plying many well established Simultaneous Localization and
Mapping (SLAM) algorithms [1], [2] on robots with a single
camera as the sole sensor. Two most notable works are
Davison’s Extended Kalman Filter (EKF) monocular SLAM
[3] and Ethan’s scalable monocular SLAM [4] which made
used of the particle filter. Both algorithms are Bayesian
approaches that require a sufficiently accurate motion model
to make predictions on the robot pose before projecting the
image features from the previous frame onto the current
frame based on these predictions for efficient feature corre-
spondence search in the measurement update. In the absence
of other interospective sensors such as the wheel encoder,
the constant velocity model was chosen as the motion model
and this works very well as long as the camera motion stays
within the bound of the error covariance from the constant
velocity model.
The constant velocity model which has brought sig-
nificant success to monocular SLAM would however be
easily violated in many practical robotics applications and
this violation usually causes the SLAM algorithms to fail
catastrophically [5]. This happens when the robot performs
an erratic motion where the movement is larger than the
prediction from the constant velocity model, or when the
robot is moving away from the predicted direction. Failure to
predict sufficiently accurate relative robot motion often leads
to failure to search for the correct feature correspondences in
the measurement update and consequently causing huge er-
rors in the pose and map estimates. Klein observed the same
problem and therefore, in addition to the constant velocity
model, he introduced a relocalizer in his PTAM framework
[6] which relocalizes the camera during localization failures.
In this paper, we present our RS-SLAM which is a
new monocular SLAM framework that combines the 5-
point RANSAC [7], [8] and FastSLAM [9] algorithms. We
propose a visual FastSLAM based framework which makes
use of the 5-point RANSAC algorithm and image feature
measurement uncertainties as the proposal distribution for the
particles during the prediction step instead of using the easily
violated constant velocity model. It was observed in [10]
that the consensus set from RANSAC that yields the highest
score might not, in practice, contain all the true inliers (thus
not being the best solution) due to noise on the feature
measurements. To make our algorithm less susceptible to
inlier misclassifications, we propose to do another RANSAC
sampling within all the inliers that have the best RANSAC
score to check for inlier misclassifications in the original
correspondences and use all the hypotheses generated from
these consensus sets in the proposal distribution. We prevent
overconfident estimates by keeping all the hypotheses where
inconsistent hypotheses will eventually be eliminated by the
particle filter resampling process.
Image features such as SIFT [11] and SURF [12] could
be used where the descriptors for these features can be
used for finding correspondences. Our algorithm follows the
FastSLAM framework. During the prediction step, particle
samples are drawn from the proposal distribution generated
from the 5-point RANSAC algorithm and image feature
uncertainties. During the update step, 3D map points are
updated according to the sampled poses from the prediction
step. We use vocabulary tree [13], [14] to detect loop
closure opportunities, and loop closures are done within
the FastSLAM framework. An important advantage of our
algorithm over many Structure from Motion techniques is
that we do not have to apply a separate bundle adjustment
[15] process for loop closures. We also show in this paper
that our monocular RS-SLAM could be easily extended to
stereo camera by replacing the 5-point RANSAC with the
3-point RANSAC [16] algorithm.
In Section II, we briefly explain the 5-point RANSAC, 3-
point RANSAC and FastSLAM algorithms which are essen-
tial in understanding our RS-SLAM algorithm. In Section III,
we describe our monocular RS-SLAM in detail. In Section
IV, we show the extension of our RS-SLAM for stereo
camera. Lastly, in Section V, we show results from the
experiments done to validate our algorithm.
2011 IEEE/RSJ International Conference on
Intelligent Robots and Systems
September 25-30, 2011. San Francisco, CA, USA
978-1-61284-456-5/11/$26.00 ©2011 IEEE 1655