Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative Multi- Robot Localization Jung H. Oh, Gyuho Eoh, and Beom H. Lee Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea Email: {bulley85, geni0620, bhlee}@snu.ac.kr Abstract—This paper deals with the problem of recognizing places based on their appearance for collaborative mobile robot localization. A whole-image descriptor and BRISK are combined in order to extract information from images collected by multiple mobile robots. The bag-of-words method is then adopted to calculate similarity scores between obtained images, and this enables each robot to find other robots’ previously visited locations. Such detections make it possible to increase the precision of the actual pose estimates and achieve a precise collaborative localization. Experiments are performed to verify the effectiveness of the proposed method in outdoor environments. Index Terms—place recognition, loop closures, BRISK, image descriptor, multi-robot, Simultaneous Localization and Mapping (SLAM) I. INTRODUCTION Simultaneous Localization and Mapping (SLAM) is one of the most widely researched areas in robotics. Recently, vision-based SLAM has become an active field as cameras have become more compact and accurate while providing the rich qualitative information of the environment. One of the most significant requirements for vision-based SLAM is robust place recognition that provides correct data association to obtain correct robot poses. In particular, finding a place that has already been visited in a cyclical excursion or arbitrary length is referred to as a loop-closure detection problem, which is crucial for enhancing the robustness of localization and mapping. The bag-of-words method has been a popular way to perform visual loop-closure detection [1]-[3]. Each image is quantized into a set of visual words, and can be represented by histograms that can be compared efficiently using histogram comparison methods. In [1], FAB-MAP framework based on the bag-of-words method with probabilistic reasoning worked robustly over a long trajectory. The bag-of-words method was extended to incremental conditions in [2]. It also relied on Bayesian filtering to estimate loop-closure probability. Both works used SIFT [4] or SURF [5] to extract features from Manuscript received April 10, 2015; revised July 12, 2015. images, as they are robust to lighting, scale, or rotation changes. In [3], a method for visual place recognition using bag of words is proposed, using the FAST [6] keypoint detector and BRIEF [7] features. In particular, this work demonstrated the effectiveness of the binary features such as BRIEF, BRISK [8] or FREAK [9], which outperform the computation time of SIFT and SURF, maintaining rotation and scale invariance. Instead of using the locally extracted imaged descriptor, loop- closure detection using the whole-image descriptor was proposed in [10]. This descriptor uses whole information of the image and does not require keypoint detection step. In general, it is more susceptible to change in the camera’s view than local descriptor methods. However, if we assume that the camera motion is planar, it is more robust to false positive errors and fast to compute the similarity. In this work, we propose a whole-image descriptor combined with a binary descriptor, BRISK, which is more invariant to scale and rotation than BRIEF. By combining the whole-image descriptor and the binary descriptor, we can exploit the advantages of both descriptors, and significantly improve the efficiency and performance of the place recognition. This allows each robot to correct its positions, which improves the accuracy of collaborative multi-robot localization. II. PLACE RECOGNITION USING WHOLE-IMAGE BRISK A. Bag-of-Words Framework Images can be represented as the set of visual words that is generated from the feature descriptors. Let k be an obtained image query from the robot k and k Z be the representations of these images in the feature space. There are many descriptors to represent images, such as SIFT or SURF. In this paper, the whole-image BRISK is used to describe the images. Then, the dictionary is built by clustering these visual descriptors, and the representative descriptors are called visual words. Given the dictionary, extracted features from each image can be quantized to the nearest visual words and can be represented by the histogram of visual words in the dictionary. Finally, each image can be International Journal of Mechanical Engineering and Robotics Research Vol. 4, No. 3, July 2015 264 © 2015 Int. J. Mech. Eng. Rob. Res. doi: 10.18178/ijmerr.4.3.264-268