Appearance-Based Place Recognition Using
Whole-Image BRISK for Collaborative Multi-
Robot Localization
Jung H. Oh, Gyuho Eoh, and Beom H. Lee
Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea
Email: {bulley85, geni0620, bhlee}@snu.ac.kr
Abstract—This paper deals with the problem of recognizing
places based on their appearance for collaborative mobile
robot localization. A whole-image descriptor and BRISK are
combined in order to extract information from images
collected by multiple mobile robots. The bag-of-words
method is then adopted to calculate similarity scores
between obtained images, and this enables each robot to
find other robots’ previously visited locations. Such
detections make it possible to increase the precision of the
actual pose estimates and achieve a precise collaborative
localization. Experiments are performed to verify the
effectiveness of the proposed method in outdoor
environments.
Index Terms—place recognition, loop closures, BRISK,
image descriptor, multi-robot, Simultaneous Localization
and Mapping (SLAM)
I. INTRODUCTION
Simultaneous Localization and Mapping (SLAM) is
one of the most widely researched areas in robotics.
Recently, vision-based SLAM has become an active field
as cameras have become more compact and accurate
while providing the rich qualitative information of the
environment. One of the most significant requirements
for vision-based SLAM is robust place recognition that
provides correct data association to obtain correct robot
poses. In particular, finding a place that has already been
visited in a cyclical excursion or arbitrary length is
referred to as a loop-closure detection problem, which is
crucial for enhancing the robustness of localization and
mapping.
The bag-of-words method has been a popular way to
perform visual loop-closure detection [1]-[3]. Each image
is quantized into a set of visual words, and can be
represented by histograms that can be compared
efficiently using histogram comparison methods. In [1],
FAB-MAP framework based on the bag-of-words method
with probabilistic reasoning worked robustly over a long
trajectory. The bag-of-words method was extended to
incremental conditions in [2]. It also relied on Bayesian
filtering to estimate loop-closure probability. Both works
used SIFT [4] or SURF [5] to extract features from
Manuscript received April 10, 2015; revised July 12, 2015.
images, as they are robust to lighting, scale, or rotation
changes. In [3], a method for visual place recognition
using bag of words is proposed, using the FAST [6]
keypoint detector and BRIEF [7] features. In particular,
this work demonstrated the effectiveness of the binary
features such as BRIEF, BRISK [8] or FREAK [9], which
outperform the computation time of SIFT and SURF,
maintaining rotation and scale invariance. Instead of
using the locally extracted imaged descriptor, loop-
closure detection using the whole-image descriptor was
proposed in [10]. This descriptor uses whole information
of the image and does not require keypoint detection step.
In general, it is more susceptible to change in the
camera’s view than local descriptor methods. However, if
we assume that the camera motion is planar, it is more
robust to false positive errors and fast to compute the
similarity.
In this work, we propose a whole-image descriptor
combined with a binary descriptor, BRISK, which is
more invariant to scale and rotation than BRIEF. By
combining the whole-image descriptor and the binary
descriptor, we can exploit the advantages of both
descriptors, and significantly improve the efficiency and
performance of the place recognition. This allows each
robot to correct its positions, which improves the
accuracy of collaborative multi-robot localization.
II. PLACE RECOGNITION USING WHOLE-IMAGE
BRISK
A. Bag-of-Words Framework
Images can be represented as the set of visual words
that is generated from the feature descriptors. Let
k
be
an obtained image query from the robot k and
k
Z be the
representations of these images in the feature space.
There are many descriptors to represent images, such as
SIFT or SURF. In this paper, the whole-image BRISK is
used to describe the images.
Then, the dictionary is built by clustering these visual
descriptors, and the representative descriptors are called
visual words. Given the dictionary, extracted features
from each image can be quantized to the nearest visual
words and can be represented by the histogram of visual
words in the dictionary. Finally, each image can be
International Journal of Mechanical Engineering and Robotics Research Vol. 4, No. 3, July 2015
264 © 2015 Int. J. Mech. Eng. Rob. Res.
doi: 10.18178/ijmerr.4.3.264-268