RS-SLAM: RANSAC Sampling for Visual FastSLAM Gim Hee Lee, Friedrich Fraundorfer, and Marc Pollefeys Computer Vision and Geometry Laboratory, Department of Computer Science, ETH Z¨ urich, Switzerland glee@student.ethz.ch, fraundorfer@inf.ethz.ch, marc.pollefeys@inf.ethz.ch Abstract—In this paper, we present our RS-SLAM algorithm for monocular camera where the proposal distribution is derived from the 5-point RANSAC algorithm and image feature measurement uncertainties instead of using the easily violated constant velocity model. We propose to do another RANSAC sampling within all the inliers that have the best RANSAC score to check for inlier misclassifications in the original cor- respondences and use all the hypotheses generated from these consensus sets in the proposal distribution. This is to mitigate data association errors (inlier misclassifications) caused by the observation that the consensus set from RANSAC that yields the highest score might not, in practice, contain all the true inliers due to noise on the feature measurements. Hypotheses which are less probable will eventually be eliminated in the particle filter resampling process. We also show in this paper that our monocular approach can be easily extended for stereo camera. Experimental results validate the potential of our approach. I. I NTRODUCTION In the recent years, we have seen much success in ap- plying many well established Simultaneous Localization and Mapping (SLAM) algorithms [1], [2] on robots with a single camera as the sole sensor. Two most notable works are Davison’s Extended Kalman Filter (EKF) monocular SLAM [3] and Ethan’s scalable monocular SLAM [4] which made used of the particle filter. Both algorithms are Bayesian approaches that require a sufficiently accurate motion model to make predictions on the robot pose before projecting the image features from the previous frame onto the current frame based on these predictions for efficient feature corre- spondence search in the measurement update. In the absence of other interospective sensors such as the wheel encoder, the constant velocity model was chosen as the motion model and this works very well as long as the camera motion stays within the bound of the error covariance from the constant velocity model. The constant velocity model which has brought sig- nificant success to monocular SLAM would however be easily violated in many practical robotics applications and this violation usually causes the SLAM algorithms to fail catastrophically [5]. This happens when the robot performs an erratic motion where the movement is larger than the prediction from the constant velocity model, or when the robot is moving away from the predicted direction. Failure to predict sufficiently accurate relative robot motion often leads to failure to search for the correct feature correspondences in the measurement update and consequently causing huge er- rors in the pose and map estimates. Klein observed the same problem and therefore, in addition to the constant velocity model, he introduced a relocalizer in his PTAM framework [6] which relocalizes the camera during localization failures. In this paper, we present our RS-SLAM which is a new monocular SLAM framework that combines the 5- point RANSAC [7], [8] and FastSLAM [9] algorithms. We propose a visual FastSLAM based framework which makes use of the 5-point RANSAC algorithm and image feature measurement uncertainties as the proposal distribution for the particles during the prediction step instead of using the easily violated constant velocity model. It was observed in [10] that the consensus set from RANSAC that yields the highest score might not, in practice, contain all the true inliers (thus not being the best solution) due to noise on the feature measurements. To make our algorithm less susceptible to inlier misclassifications, we propose to do another RANSAC sampling within all the inliers that have the best RANSAC score to check for inlier misclassifications in the original correspondences and use all the hypotheses generated from these consensus sets in the proposal distribution. We prevent overconfident estimates by keeping all the hypotheses where inconsistent hypotheses will eventually be eliminated by the particle filter resampling process. Image features such as SIFT [11] and SURF [12] could be used where the descriptors for these features can be used for finding correspondences. Our algorithm follows the FastSLAM framework. During the prediction step, particle samples are drawn from the proposal distribution generated from the 5-point RANSAC algorithm and image feature uncertainties. During the update step, 3D map points are updated according to the sampled poses from the prediction step. We use vocabulary tree [13], [14] to detect loop closure opportunities, and loop closures are done within the FastSLAM framework. An important advantage of our algorithm over many Structure from Motion techniques is that we do not have to apply a separate bundle adjustment [15] process for loop closures. We also show in this paper that our monocular RS-SLAM could be easily extended to stereo camera by replacing the 5-point RANSAC with the 3-point RANSAC [16] algorithm. In Section II, we briefly explain the 5-point RANSAC, 3- point RANSAC and FastSLAM algorithms which are essen- tial in understanding our RS-SLAM algorithm. In Section III, we describe our monocular RS-SLAM in detail. In Section IV, we show the extension of our RS-SLAM for stereo camera. Lastly, in Section V, we show results from the experiments done to validate our algorithm. 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems September 25-30, 2011. San Francisco, CA, USA 978-1-61284-456-5/11/$26.00 ©2011 IEEE 1655