Revision to University of Minnesota CS dept. Tech. Report available at http://www.cs.umn.edu/tech reports upload/tr2011/11-028.pdf Eﬃcient Similarity Search via Sparse Coding Anoop Cherian Vassilios Morellas Nikolaos Papanikolopoulos {cherian, morellas, npapas}@cs.umn.edu Abstract This work presents a new indexing method using sparse coding for fast approximate Nearest Neighbors (NN) on high dimensional image data. To begin with we sparse code the data using a learned basis dictionary and an index of the dictionary’s support set is next used to generate one compact identiﬁer for each data point. As basis combinations increase exponentially with an increasing support set, each data point is likely to get a unique identiﬁer that can be used to index a hash table for fast NN operations. When dealing with real world data, the identiﬁers corresponding to the query point and the true nearest neighbors in the database seldom match exactly (due to image noise, distortion, etc.). To accommodate these near matches, we propose a novel extension of the framework that utilizes the regularization path of the LASSO formulation to create robust hash codes. Experiments are conducted on large datasets and demonstrate that our algorithm rivals state-of-the-art NN techniques in search time, accuracy and memory usage. 1 Introduction The majority of the vision tasks in the recent times require eﬃcient strategies to search for Nearest Neighbors (NN) of a query point in a database. Examples include, but not limited to face recognition, object tracking, multiview 3D reconstruction [46] and video search engines [45]. Image data are generally high dimensional; Scale Invariant Feature Transforms (SIFT) [32], Generalized image descriptors [48], shape contexts [40], Histogram of Oriented Gradients (HOG) [9], etc. are a few examples. When the data is high dimensional, sophisticated data structures might need to be designed to make the computation of NNs eﬃcient. The problem is not restricted to computer vision, but manifests itself in many other domains, such as document search, content-based information retrieval [11], multimedia databases [3], etc. In this paper, we develop algorithms for fast NN operations. Our approach is motivated by recent advances in the theories of sparse signal processing and compressive sensing. The latter deals with the idea of sampling and reconstructing signals that are sparse in a speciﬁc overcomplete basis, such that perfect reconstruction can be achieved via only very few samples (as compared to the number of samples prescribed by the Shannon’s sampling theorem) [12]. For general data descriptors (e.g. SIFT descriptors), it is non-trivial to ﬁnd this overcomplete basis dictionary. To meet this challenge, dictionary learning techniques have been suggested [41, 16] whereby the data itself steers a basis dictionary by solving an L1 regularized least- squares problem (namely LASSO). This idea of dictionary learning has been leveraged for very successful applications such as image denoising, object classiﬁcation, recognition [35], etc. A natural next question however is whether using the sparsity itself can achieve fast NN retrieval on high dimensional data. This paper examines this possibility in detail and proposes algorithms applicable to the visual data domain. Before proceeding further, we brieﬂy list the primary contributions of this paper: • We propose a novel tuple representation, Subspace Combination Tuple (SCT), for high dimensional data using a learned dictionary that assists in indexing the data vectors through a hashtable. • Utilizing the regularization path of the LARS algorithm for solving the LASSO formulation, we propose a simple and eﬀective method for generating robust hash codes on the above SCTs for fast and accurate NN retrieval. 2 Related Work Low dimensional data has several data structure based schemes that for eﬃcient ANN retrieval, such as k-d trees, R-trees, etc.([28, 8]). Unfortunately, when the data dimensionality increases as high as 20, the eﬃciency of such schemes starts 1