Low Latency Image Retrieval with Progressive Transmission of CHoG Descriptors Vijay Chandrasekhar Stanford University, CA vijayc@stanford.edu Sam S. Tsai Stanford University, CA sstsai@stanford.edu Gabriel Takacs Stanford University, CA gtakacs@stanford.edu David M. Chen Stanford University, CA dmchen@stanford.edu Ngai-Man Cheung Stanford University, CA nmcheung@stanford.edu Ramakrishna Vedantham Nokia Research Center, CA ramakrishna.vedantham@nokia.com Yuriy Reznik Qualcomm Inc., CA yreznik@qualcomm.com Radek Grzeszczuk Nokia Research Center, CA radek.grzeszczuk@nokia.com Bernd Girod Stanford University, CA bgirod@stanford.edu ABSTRACT To reduce network latency for mobile visual search, we pro- pose schemes for progressive transmission of Compressed Histogram of Gradients (CHoG) descriptors. Progressive transmission reduces the amount of transmitted data and enables early termination on the server, thus reducing end- to-end system latency. With progressive transmission of CHoG descriptors, we are able to reduce network latency to ∼1 second in a 3G network. We report a 4× decrease in end-to-end system latency compared to transmitting un- compressed SIFT descriptors or JPEG images. Categories and Subject Descriptors C.5.0 [Computer Systems Organization]: Computer Sys- tems Implementation—General General Terms Algorithms,Design Keywords mobile visual search, CHoG, content-based image retrieval 1. INTRODUCTION Mobile phones have evolved into powerful image and video processing devices, equipped with high-resolution camera, color displays, and hardware-accelerated graphics. They are also equipped with location sensors, GPS receivers, and con- nected to broadband wireless networks allowing fast trans- mission of information. This enables a class of applications which use the camera phone to initiate search queries about objects in visual proximity to the user. Such applications Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MCMC’10, October 29, 2010, Firenze, Italy. Copyright 2010 ACM 978-1-4503-0168-8/10/10 ...$10.00. Figure 1: A mobile CD cover recognition system where the server is located at a remote location. Feature descriptors are extracted on the mobile- phone and query feature data is sent over the net- work. Once the CD cover is recognized on the server, identification data is sent back to the mobile- phone. can be used for identifying products, comparison shopping, finding information about movies, CDs, real estate or prod- ucts of the visual arts. Google Goggles [1] and Nokia Point and Find [2] are examples of recently developed commercial applications. For these applications, a query photo is taken by a mobile device and compared against database photos on a remote server. A set of image feature descriptors is used to assess the similarity between the query photo and each database photo. In designing such systems, it is important to ensure fast and accurate retrieval of the results. The system latency can be broken down into 3 compo- nents: (a) Processing time on mobile client (b) Network transmission latency and (c) Processing time on server. In [16, 15], we show that processing on the server and client take approximately ∼1 second each, while the network trans- mission typically is the bottleneck in a 3G system. Hence, the size of the data sent over the network needs to be as small as possible to reduce latency and improve user interaction. To reduce network latency, we extract feature descriptors on the phone, compress the descriptors and transmit them over the network as illustrated in Figure 1. In this work, we focus on how system latency can be minimized using progressive transmission of query data. 1.1 Prior Work In [15], we present a state-of-the-art mobile product recog- nition system using a camera phone. The product is recog- nized through an image-based retrieval system located on a 41