Adapting a Real-Time Monocular Visual SLAM from Conventional to Omnidirectional Cameras Daniel Gutierrez, Alejandro Rituerto, J.M.M. Montiel, J.J. Guerrero Departamento de Inform´ atica e Ingenier´ ıa de Sistemas(DIIS)-Instituto de Investigaci´ on en Ingenier´ ıa de Arag ´ on(I3A) Universidad de Zaragoza, Spain dangu87@gmail.com, {arituerto, josemari, jguerrer}@unizar.es Abstract The SLAM (Simultaneous Localization and Mapping) problem is one of the essential challenges for the current robotics. Our main objective in this work is to develop a real-time visual SLAM system using monocular omnidi- rectional vision. Our approach is based on the Extended Kalman Filter (EKF). We use the Spherical Camera Model to obtain geometric information from the images. This model is integrated in the EKF-based SLAM through the linearization of the direct and the inverse projections. We introduce a new computation of the descriptor patch for catadioptric omnidirectional cameras which aims to reach rotation and scale invariance. We perform experiments with omnidirectional images comparing this new approach with the conventional one. The experimentation conﬁrms that our approach works better with omnidirectional cameras since features last longer and constructed maps are bigger. 1. Introduction The SLAM [25] problem tries to build a map of the sur- rounding and localize an autonomous robot relative to this map using only partial measurements of the environment. SLAM is usually formulated in a probabilistic way, i.e. the estimate of the robot position and map are computed as a probability distribution. Two main approaches are used for the computation of the probability distribution: the ex- tended Kalman ﬁlter (EKF) [25] and the particle ﬁlter [2]. In Visual SLAM applications, image projections of rel- evant points known as local features are used as measure- ments. To extract and store the features on the image an extractor and descriptor are used. The feature extractor pro- cess the image and detects the key-points on it. The image processing is a high time-consuming step, which is critical for a real time application like SLAM. Rosten et al.[20] developed the feature extraction algorithm FAST (Features Accelerated Segment Test). They benchmark their FAST extractor with other widely used feature extractors showing that FAST outperforms them in computational cost and in repeatability when viewing the scene from different posi- tions. The descriptor provides an identiﬁer to an extracted point so that it can be recognised in future measurements. The most basic descriptor is a patch of a certain size cen- tered in the key-point, although there exists more kinds of descriptors like SIFT [13], SURF [4], LBP [12], etc. Since the seminal work of Davison [9], monocular SLAM has been a fertile research ﬁeld. In this work we propose to combine state of the art robust EKF SLAM [7] with an omnidirectional sensor. Visual SLAM using om- nidirectional cameras has been proposed in [8], [15], [24] and [22]. Due to the 360 o FOV of omnidirectional cameras, fea- tures last longer on the image than in the case of conven- tional cameras, specially in big camera rotations. The in- creased lifespan of the features on the image translates in a better estimation of the position of the features on the map, a lower need to initialise new features and a increased ro- bustness. However the omnidirectional images involve a more complex projection model, important image deformation, distortion and variable scale in the image. So, the fea- ture descriptor should be modiﬁed for catadioptric cam- eras. In this way, Svoboda and Padjla [23] propose the use of patches with variable size and shape (active win- dows). Their experiments show that active windows pro- vide best matching results than square windows. Ieng et al.[5] propose the computation of patches of different angu- lar apertures for the same feature to overcome the matching problems derived from the varying resolution of the camera. Scaramuzza et al.[21] take advantage of the projection of vertical lines of the world as radial lines on the image. They propose a method to extract and match vertical lines with rotation invariant descriptors and apply this method to an EKF-SLAM. In [1] Andreasson et al. propose a modiﬁed SIFT feature with no scale invariance. To obtain rotation invariance they rotate each patch to the same global orien- tation. Lu and Zheng [14] combine the rotation invariant patch by Andreasson with a FAST extractor and a CS-LBP descriptor and they compare it with the SIFT algorithm. 1