Accelerating Brain Circuit Simulations of Object Recognition with CELL Processors Andrew Felch, Jayram Moorkanikara Nageswaran 1 , Ashok Chandrashekar, Jeff Furlong 1 , Nikil Dutt 1 , Richard Granger, Alex Nicolau 1 , Alex Veidenbaum 1 Neukom Institute, Dartmouth College Hanover, NH 03755, USA E-mail:andrew.felch,ashok.chandrashekar,richard.granger@dartmouth.edu 1 Centre for Embedded Computer Systems, University of California, Irvine Irvine, CA 92697, USA E-mail: jmoorkan,jfurlong,dutt,nicolau,alexv@ics.uci.edu Abstract Humans outperform computers on many natural tasks including vision. Given the human ability to recognize objects rapidly and almost effortlessly, it is pragmatically sensible to study and attempt to imitate algorithms used by the brain. Analysis of the anatomical structure and physiological operation of brain circuits has led to derivation of novel algorithms that in initial study have successfully addressed issues of known difficulty in visual processing. These algorithms are slow on uni-processor based systems, thwarting attempts to drive real-time robots for behavioral study, but as might be expected of algorithms designed for highly parallel brain architectures, they are intrinsically parallel and lend themselves to efficient implementation across multiple processors. This paper presents an implementation of such parallel algorithms on a CELL processor and further extends it to a low-cost cluster built using the Sony PlayStation 3 (PS3). The paper describes the modeled brain circuitry, derived algorithms, implementation on the PS3, and initial performance evaluation with respect to both speed and visual object recognition efficacy. The results show that a parallel implementation can achieve a 140x performance improvement on a cluster of 3 PS3s, attaining real-time processing delays. More importantly, we show that the improvements scale linearly, or nearly so in practice. These initial findings, while highly promising in their own right, also provide a new platform to enable extended investigation of large scale brain circuit models. Early prototyping of such large scale models has yielded evidence of their efficacy in recognition of time-varying, partially occluded, scale-invariant objects in arbitrary scenes. 1. Introduction Processors have experienced tremendous progress (Moore’s Law) and computer chips now have a million times more building blocks than they did 40 years ago. Historically, Intel et al. have attempted to use those resources (transistors) to increase the speed of already-existing programs by: (1) supporting higher instruction throughput (using pipelines, caches, branch prediction etc.,) and, (2) finding and executing multiple instructions simultaneously. After many years, both of these techniques are now facing severely diminishing returns, and in an extreme divergence from tradition the newest chips yielded by Moore’s Law no longer speed up old programs. Instead, the additional transistors are used to fabricate multiple CPUs on a single computer chip. The unfortunate drawback is that few applications contain the parallelism necessary to significantly benefit from the additional CPUs. In contrast, the mammalian brain has evolved circuits that lack any central processors or main memory but instead comprise billions of low-precision processing units (neurons) with distributed memory (synapses) stored within their interconnections. With such a simple computing fabric, how can humans still outperform computers at natural tasks such as visual object recognition? We propose that these brain circuit components are designed and organized into specific brain circuit architectures that perform atypical but quite understandable algorithms conferring unexpectedly powerful functions to the resulting composed circuits. As an example, humans recognize visual objects in less than a second, during which billions of neurons receive input from the visual scene, but due to slow neuron communication (milliseconds) only a few tens of serial operations are performed. Algorithms derived from the anatomical structure and physiological operation of these circuits similarly lack serial dependencies and are inherently parallel, thus poised to take advantage of parallel hardware such as multi-core processors. In this paper we first present the components of visual brain circuit architecture, and an overview of visual object