DRAFT April 5, 2011- 19:46 GPU Metaprogramming: A Case Study in Biologically-Inspired Machine Vision Nicolas Pinto * David D. Cox April 5, 2011 In this chapter, we present a tutorial on ways that metaprogramming techniques – dynamically generating specialized code at runtime and compiling it just-in-time – can be used to greatly accelerate an application. We use filter-bank convolution, a key component of the biologically-inspired machine vision systems that form the core of our research program, as a case study to illustrate these techniques. We present an overview of several key themes in template metaprogramming, and culminate in a full example of GPU auto-tuning in which an instrumented GPU kernel template is built and the space of all possible instantiations of this kernel is automatically grid-searched to find the best implementation on various hard- ware/software platforms. We show that this method can, in concert with traditional hand-tuning techniques, achieve significant speed-ups, particularly when a kernel will be run on a variety of hardware platforms. 1 Introduction, Problem Statement, and Context In recent years, digital cameras have become increasingly inexpensive and ubiqui- tous, and cameras are now embedded in a wide array of devices, from cellphones to cars. This explosion in imaging technology has led to enormous opportunity in the field of computer vision, as the need grows for algorithms that can automatically analyze, organize, and react to the new torrent of digital imagery. While traditional machine vision algorithms achieve modest success in certain tasks (e.g., detecting the presence of a face in an image), many other visual tasks * Massachusetts Institute of Technology, Cambridge, MA 02139 The Rowland Institute at Harvard, Harvard University, Cambridge, MA 02142 1