Abstract - We present a method of transforming local image de- scriptors into a compact form of bit-sequences whose similarity is determined by Hamming distance. Following the Locality-Sensitive Hashing approach, the descriptors are projected on a set of random directions that are learned from a set of non-matching data. The learned random projections result in high-entropy binary codes (HE 2 ) that outperform codes based on standard random projections in match/non-match classification and nearest neighbor search. Despite of data compression and granularity of Hamming space, HE 2 -descriptor outperforms the original descriptor in the classifica- tion task. In nearest neighbor search task, the performance of the HE 2 -descriptor is asymptotic to the performance of the original descriptor. As a supporting result, we obtain another descriptor, HE 2 +1, and demonstrate that the performance of the original de- scriptor can be improved by adding a few bits derived from the descriptor itself. I. INTRODUCTION Image descriptor matching addresses the following problem: given a local descriptor extracted from one image, find a set of similar descriptors extracted from other images. This problem occurs in different applications, e.g. image registration for pano- ramic stitching [1],[2], object recognition [3]-[5], and context- based image and video retrieval [6]. As the typical descriptors are high-dimensional, the issues of efficient storage and compu- tations become important for processing of large image datasets. In this paper we explore a compact representation of local de- scriptors, in which each element of the descriptor vector is quan- tized to one bit so that the Hamming distance between quantized vectors is a measure of similarity between the original descrip- tors. We store only the signs of the original descriptor projected onto a set of learned random directions, thus performing embed- ding from Euclidean to Hamming space. This kind of descriptor representation is a recent issue in computer vision and multime- dia applications ([16]-[19]). The most relevant work [17] uses binary codes obtained from coarsely quantized of random projections for rate-efficient broadcasting of image descriptors in a wireless camera network. Thus, the authors of [17] propose to solve visual correspondence task by moving from original descriptor space to Hamming space completely. Torralba et al [18] argue that random projec- tion based codes perform poorly when the number of bits is fixed and small, and performance increases when the number of random projections grows. Therefore they propose different learning-based methods, based on Boosting similarity sensitive coding, restricted Boltzmann machines [18] and spectral hashing [19] “…to learn a compact code rather than waiting for it to emerge from random projections”. The descriptors [18], [19] are extracted from an entire image by capturing rich global structure of the image, and this ensures good retrieval performance. In this paper, we work with local descriptors, extracted by some existing method (e.g., SIFT [7]). Such descriptors capture information only from small image patches and therefore their matching is often ambiguous. We build our method on the origi- nal idea from [17] and show that a simple learning algorithm based on waiting a right random projection to emerge can pro- duce compact and high-entropy Hamming embedded (HE 2 ) de- scriptors that can outperform the original descriptors. Another relevant work [16] proposes an alternative and also simple algo- rithm that learns the quantization threshold as a median of train- ing descriptors projected to orthogonal random directions. The main motivation to replace original descriptors by their binary versions is based on a simple numerical experiment, in which we compare the computation speed of 128-bit Hamming distance and 128-dimensional squared Euclidean distance. The results obtained on a mainstream 3GHz CPU are: 55 million Hamming distances per second versus 1.8 million Euclidean distances per second. We propose to use such gains in computa- tion speed and memory optimization in applications where local image descriptors are usually involved. We show that such im- provement in computations is achieved without significant loss in matching quality. Our main contributions are the following: • We propose a simple method to learn random projections from a training set of non-matching data. This method does not re- quire prior labeling of matching and non-matching data subsets as in other descriptor learning methods [9],[15],[18],[19]. In- stead we use only a set of non-matching data, which is much easier to obtain in practice (e.g. by applying some descriptor extraction algorithm to a random collection of photos). • We propose the following criterion of optimality for learning random projections: the bit-sequences obtained from non- matching data are required to be maximal entropy codes. Al- though Daugman [11] demonstrated this property of his Iris- Codes and maximal entropy criterion is satisfied in method [16] by construction, we are not aware of any works that use this criterion in a local descriptor learning framework. II. HAMMING EMBEDDING USING RANDOM PROJECTIONS Projection-based Hamming embedding can be expressed as ) (Rx h z s = , (1) where x=(x 1 ,…,x D ) T is the original descriptor, z=(z 1 ,…,z d ) T its bit representation, R is a d×D projection matrix, and h s (y) is a simple hash function that computes the sign of vector elements: ⎩ ⎨ ⎧ = < ≥ = d i y y y h i i i s ,..., 1 , 0 , 0 0 , 1 ) ( , (2) High-Entropy Hamming Embedding of Local Image Descriptors Using Random Projections Alexander Sibiryakov Mitsubishi Electric Research Centre Europe, Guildford, UK a.sibiryakov@uk.merce.mee.com MMSP’09, October 5-7, 2009, Rio de Janeiro, Brazil. 978-1-4244-4464-9/09/$25.00 ©2009 IEEE.