Abstract - We present a method of transforming local image de-
scriptors into a compact form of bit-sequences whose similarity is
determined by Hamming distance. Following the Locality-Sensitive
Hashing approach, the descriptors are projected on a set of random
directions that are learned from a set of non-matching data. The
learned random projections result in high-entropy binary codes
(HE
2
) that outperform codes based on standard random projections
in match/non-match classification and nearest neighbor search.
Despite of data compression and granularity of Hamming space,
HE
2
-descriptor outperforms the original descriptor in the classifica-
tion task. In nearest neighbor search task, the performance of the
HE
2
-descriptor is asymptotic to the performance of the original
descriptor. As a supporting result, we obtain another descriptor,
HE
2
+1, and demonstrate that the performance of the original de-
scriptor can be improved by adding a few bits derived from the
descriptor itself.
I. INTRODUCTION
Image descriptor matching addresses the following problem:
given a local descriptor extracted from one image, find a set of
similar descriptors extracted from other images. This problem
occurs in different applications, e.g. image registration for pano-
ramic stitching [1],[2], object recognition [3]-[5], and context-
based image and video retrieval [6]. As the typical descriptors
are high-dimensional, the issues of efficient storage and compu-
tations become important for processing of large image datasets.
In this paper we explore a compact representation of local de-
scriptors, in which each element of the descriptor vector is quan-
tized to one bit so that the Hamming distance between quantized
vectors is a measure of similarity between the original descrip-
tors. We store only the signs of the original descriptor projected
onto a set of learned random directions, thus performing embed-
ding from Euclidean to Hamming space. This kind of descriptor
representation is a recent issue in computer vision and multime-
dia applications ([16]-[19]).
The most relevant work [17] uses binary codes obtained from
coarsely quantized of random projections for rate-efficient
broadcasting of image descriptors in a wireless camera network.
Thus, the authors of [17] propose to solve visual correspondence
task by moving from original descriptor space to Hamming
space completely. Torralba et al [18] argue that random projec-
tion based codes perform poorly when the number of bits is
fixed and small, and performance increases when the number of
random projections grows. Therefore they propose different
learning-based methods, based on Boosting similarity sensitive
coding, restricted Boltzmann machines [18] and spectral hashing
[19] “…to learn a compact code rather than waiting for it to
emerge from random projections”. The descriptors [18], [19] are
extracted from an entire image by capturing rich global structure
of the image, and this ensures good retrieval performance.
In this paper, we work with local descriptors, extracted by
some existing method (e.g., SIFT [7]). Such descriptors capture
information only from small image patches and therefore their
matching is often ambiguous. We build our method on the origi-
nal idea from [17] and show that a simple learning algorithm
based on waiting a right random projection to emerge can pro-
duce compact and high-entropy Hamming embedded (HE
2
) de-
scriptors that can outperform the original descriptors. Another
relevant work [16] proposes an alternative and also simple algo-
rithm that learns the quantization threshold as a median of train-
ing descriptors projected to orthogonal random directions.
The main motivation to replace original descriptors by their
binary versions is based on a simple numerical experiment, in
which we compare the computation speed of 128-bit Hamming
distance and 128-dimensional squared Euclidean distance. The
results obtained on a mainstream 3GHz CPU are: 55 million
Hamming distances per second versus 1.8 million Euclidean
distances per second. We propose to use such gains in computa-
tion speed and memory optimization in applications where local
image descriptors are usually involved. We show that such im-
provement in computations is achieved without significant loss
in matching quality. Our main contributions are the following:
• We propose a simple method to learn random projections from
a training set of non-matching data. This method does not re-
quire prior labeling of matching and non-matching data subsets
as in other descriptor learning methods [9],[15],[18],[19]. In-
stead we use only a set of non-matching data, which is much
easier to obtain in practice (e.g. by applying some descriptor
extraction algorithm to a random collection of photos).
• We propose the following criterion of optimality for learning
random projections: the bit-sequences obtained from non-
matching data are required to be maximal entropy codes. Al-
though Daugman [11] demonstrated this property of his Iris-
Codes and maximal entropy criterion is satisfied in method
[16] by construction, we are not aware of any works that use
this criterion in a local descriptor learning framework.
II. HAMMING EMBEDDING USING RANDOM PROJECTIONS
Projection-based Hamming embedding can be expressed as
) (Rx h z
s
= , (1)
where x=(x
1
,…,x
D
)
T
is the original descriptor, z=(z
1
,…,z
d
)
T
its
bit representation, R is a d×D projection matrix, and h
s
(y) is a
simple hash function that computes the sign of vector elements:
⎩
⎨
⎧
=
<
≥
= d i
y
y
y h
i
i
i s
,..., 1 ,
0 , 0
0 , 1
) ( ,
(2)
High-Entropy Hamming Embedding of Local
Image Descriptors Using Random Projections
Alexander Sibiryakov
Mitsubishi Electric Research Centre Europe, Guildford, UK
a.sibiryakov@uk.merce.mee.com
MMSP’09, October 5-7, 2009, Rio de Janeiro, Brazil.
978-1-4244-4464-9/09/$25.00 ©2009 IEEE.