GPU Acceleration of Eff 2 Descriptors using CUDA Kristleifur Daðason, Herwig Lejsek, Ársæll Þ. Jóhannsson Videntifier Technologies ehf. Ofanleiti 2 101 Reykjavík, Iceland {kristleifur, herwig, arsaell}@videntifier.com Björn Þór Jónsson Reykjavík University School of Computer Science 101 Reykjavík, Iceland bjorn@ru.is Laurent Amsaleg CNRS-IRISA Campus de Beaulieu 35042 Rennes, France laurent.amsaleg@irisa.fr ABSTRACT Video analysis using local descriptors requires a high-throughput descriptor creation process. This speed can be obtained from modern GPUs. In this paper, we adapt the computation of the Eff 2 descriptors, a SIFT variant, to the GPU. We com- pare our GPU-Eff 2 descriptors to SiftGPU and show that while both variants yield similar results, the GPU-Eff 2 de- scriptors require significantly less processing time. Categories and Subject Descriptors I.4.7 [Image Processing and Computer Vision]: Fea- ture Measurement; D.1.3 [Programming Techniques]: Con- current Programming—Parallel Programming General Terms Algorithms, Experimentation, Measurement, Performance Keywords cuda, sift, gpu, gpgpu, local image descriptors, image retrieval 1. INTRODUCTION Video analysis is a central component in many applica- tions, such as video surveillance, news analysis, and video copyright protection. Recent methods for such analysis are typically based on computing many local descriptors per frame, which are then merged to form the video descrip- tion. As many video analysis applications require real-time performance, high demands are made on the efficient com- putation of the local descriptors. 1.1 Scaling Descriptor Creation The traditional method for achieving high throughput is using a computer cluster and large-grain parallelism where the data collection is split into independent parts. The ad- vantage is that the standard non-parallel description code can be used. However, there are two major disadvantages. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’10, October 25–29, 2010, Firenze, Italy. Copyright 2010 ACM 978-1-60558-933-6/10/10 ...$10.00. Computing clusters are relatively expensive, and it is diffi- cult in practice to deliver cluster-based software products to end users. An alternative method, solving both problems, is to use powerful graphics processing units (GPU). The advent of highly scalable and parallel yet inexpensive GPUs has been a minor revolution in the computer industry; many projects have therefore evaluated GPUs in a variety of tasks such as feature tracking [8] and local descriptor computation [9]. However, the disadvantage of GPUs is that the descrip- tion code does not work unchanged. Data and computations have to be adapted to meet constraints on the access pat- terns and operations available on-GPU. As a result, some computational processes reimain incompletely adapted, e.g. forcing data loadback to the host CPU for completion (e.g., see [9]). Fortunately GPUs have now become much easier to utilize due to the recently released CUDA programming environment from NVIDIA [3]. The CUDA model relaxes memory access patterns, and supports a large set of com- puting primitives. 1.2 Contributions In the past, large-scale performance studies of description creation have been next to impossible, due to the comput- ing power required for creating the local descriptors. When varying parameters, the descriptors must be created over and over, making the whole process time-consuming. As a result, most such studies have been performed using small collections (e.g., see [4]). With GPU processing, however, large-scale studies easily become feasible. This paper presents initial steps on the path to a large-scale study of all GPU- based variants. The “gold standard” in local descriptions has been con- sidered the SIFT descriptors, proposed by Lowe in 2004 [7]. Since then, several variants have been proposed, such as PCA-SIFT [4] and the Eff 2 -descriptors [5]. Previous work showed the Eff 2 -descriptors to outperform many of the SIFT variants in the context of very-large-scale descriptor databases where small differences in descriptor schemes can have a large effect on retrieval [5]. We therefore adapt the compu- tation of the Eff 2 descriptors to the GPU through the CUDA environment. We compare the GPU-Eff 2 descriptors to Sift- GPU [9], another GPU-based variant of SIFT, and show that while both GPU-based variants yield similar results (better than SIFT, and comparable to Eff 2 ), the GPU-Eff 2 descrip- tors require significantly less processing time. Note that since performing this comparison we have be- come aware of the more recent SURF descriptors [1], which are an even faster variant of SIFT, and a GPU version