Summary of Visual SLAM Approaches 1 st Azadeh Hadadi UBFC, Centre universitaire Condorcet VIBOT program: Master in computer vision Le Creusot, France Abstract—this summary paper presents SLAM in brieﬂy and will focus more speciﬁcally on visual SLAM (vSLAM). vSLAM approached (speciﬁcally in dynamic environment), structure from motion, expected beneﬁt of RGB-D camera in SLAM will be described. The Extended Kalman ﬁler is the core process in SLAM which detailed in [4]. Index Terms—SLAM, vSLAM, 3D mapping, RGB-D camera I. DIFFERENT VSLAM APPROACHED The SLAM Stand for Simultaneous Localization and Map- ping, which is used to construct or update a navigation map and keep tracking of a robot in an unknown environment. The SLAM has been widely used in applications such as robotics, computer vision-based online 3D modeling, augmented reality (AR)-based visualization, and self-driving cars [1]. The heart of the SLAM is an Extended Kalman ﬁlter using equation (1) [2]. p (x t ,m|z 1:t ,u 1t )=  ···  p (x 1:t ,m|z 1t ,u 1:t ) dx 1 ...dx t-1 (1) Where, x t and m, z 1:t , u 1:t represent the state of the robot (pose), the map of the landmark, odometry measurement and actuation command at time t respectively. The main steps of SLASM are as follows: 1) State prediction (odometry) 2) Measurement prediction 3) Observation 4) Data Association 5) Update 6) Integration of new landmarks Recently a new type of SLAM was developed, which uses a camera as sensor, referred to as visual SLAM (vSLAM) because it is based on visual information only [3] [5]. All vSLAM approaches can be classiﬁed into 3 categories: 1- feature-based (monocular cameras, tracking and mapping using feature points) 2-direct (using whole image without feature extraction), and 3- RGB-D camera-based (monocular image with depth such as Kinect; RGB+IR cameras). There exist two types of feature-based methods, MonoSLAM and PTAM. The difference between to methods is in their implementation (PTAM computes tracking and mapping in parallel). Three type of direct method or feature less SLAM were developed, i.e., DTAM, LSD-SLAM, and SVO and DSO. All direct methods are using kind of visual odometry (VO) to estimate the position of the sensor. The relation between VO and SLAM is as follows: vSLAM = VO + global map optimization. The most important beneﬁt of the third vSLAM method is the direct measurement of the depth, which is acquired by IR sensor. By using RGB-D cameras, 3D structure of the envi- ronment with its texture information can be obtained directly. In addition, in contrast to monocular vSLAM algorithms, the scale of the coordinate system is known because 3D structure can be acquired in the metric space. In depth (D)- based vSLAM, an iterative closest point (ICP) algorithm is widely used to estimate camera motion. Then, the 3D structure of the environment is reconstructed by combining multiple depth maps. ICP algorithms such as KinectFusion, SLAM++, RGB-D VO and global map optimization are employed to incorporate RGB into depth-based vSLAM. In dynamic environments, the objects (for instance, moving vehicle and pedestrian in case of unmanned car) are moving so in feature-based approaches selecting feature from dynamic object might create error during the mapping. It seems all of the these approaches can be signiﬁcantly improved when some kind of sensor fusion is done. For instance, semantic segmentation and fusing the sensor data with segmentation result will improve the performance of vSLAM. fusion of mean-shift tracking data with VO output will improve the feature based algorithm as well. For some application, such as UAV robots, instead of having landmarks as coordinates a full 3D map of the environment is required. In such a cases structure from motion algorithm is used to reconstruct 3D environment and improve the 3D model incorporating features form multiple view images. structure from motion is an optimization problem too but it is different from the vSLAM as in the vSLAM the localization and mapping is done simultaneously and the kalman ﬁlter does the optimization. structure from motion can be used to extract 3D map in SLAM frame work. REFERENCES [1] Josep Aulinas, Yvan R Petillot, Joaquim Salvi, and Xavier Llad´ o. The slam problem: a survey. CCIA, 184(1):363–371, 2008. [2] Wolfram Burgard, Cyrill Stachniss, Kai Arras, and Maren Bennewitz. Slam: Simultaneous localization and mapping. University of Freiburg. Website: http://ais. informatik. unifreiburg. de/teaching/ss12/robotics/slides/12-slam. pdf, 2018. [3] Howie Choset. Localization, Mapping, SLAM and The Kalman Filter according to George, chapter Chapter 8, pages 1–64. Carnegie Mellon University press, 2018. [4] Azadeh Hadadi. Introdcution to kalman ﬁlter. University press, 2020. [5] Takafumi Taketomi, Hideaki Uchiyama, and Sei Ikeda. Visual slam algorithms: a survey from 2010 to 2016. IPSJ Transactions on Computer Vision and Applications, 9(1):16, 2017.