Low power high-performance smart camera system based on SCAMP vision sensor Stephen J. Carey, David R.W. Barr, Piotr Dudek ⇑ School of Electrical and Electronic Engineering, The University of Manchester, Manchester M13 9PL, United Kingdom article info Article history: Available online 17 April 2013 Keywords: SIMD Low power Vision chip Cellular arrays Parallel processing Image processing abstract Vision sensors based upon pixel-parallel cellular processor arrays offer the unique opportunity to realise high-performance, flexible, low power image processing systems. By virtue of processing on the focal- plane, the energy-demanding requirement to digitize a captured frame’s raw pixel data is reduced, with returned data constituting only that which is salient. We describe a stand-alone vision system incorpo- rating a SCAMP-3 vision chip, an FPGA and an ARM Cortex-M3 microcontroller. SCAMP integrated circuits operate as SIMD computers; each pixel incorporating a compact but powerful analogue processor and local memory, with all operations occurring in parallel over the 128 Â 128 array. Algorithms are devel- oped to operate natively upon the focal-plane as far as possible, with additional serial and higher-level operations occurring on the microcontroller. The power consumption of the system is algorithm-depen- dent. An algorithm developed for loiterer detection at 8 fps has been shown to consume an average power of 5.5 mW, with a more complex object tracking and counting system consuming 29 mW. Ó 2013 Published by Elsevier B.V. 1. Introduction In designing a vision system to meet the requirements of long battery life, the conventional approach is to take off-the-shelf com- ponents that are individually low power, and assemble them into a system, as shown in Fig. 1a. This conventional image processing system consists of an image sensor, ADC (perhaps on-chip), a microprocessor and memory system. Energy is expended in digitis- ing the image, moving the data to memory, and in further process- ing the data (perhaps relative to past frames or to neighbouring pixels) to establish, for example, whether objects of interest have moved into the frame. A system designed for low power operation can realise significant performance gains over systems designed for environments without power constraints. However, image digiti- sation is still required before it is possible to discern that the vision sensor has seen nothing of interest. For example, in a surveillance application, this would be due to consecutive frames being nearly identical. In a more complex scenario, this would be due to objects (except those already identified and deemed uninteresting) exhib- iting any significant motion or pattern of motion. Hence, there is considerable scope to reduce power consumption by determining, as early as possible in the signal processing chain that a new image is without a salient event. In Fig. 1b, a processor array for each col- umn of the image sensor [1] is implemented; this allows a facility to perform some image processing on the same die as the image capture. Extending this approach to a vision sensor with a proces- sor per pixel, as in Fig. 1c, allows low level image processing at the first point of image capture. We refer to these devices as ‘vision chips’. Using a vision chip allows some of the pre-processing, conven- tionally performed by the microprocessor, to be conducted upon the focal plane. The in-pixel processors can operate directly on analogue data. A massively parallel architecture, with a very large number of processing circuits concurrently operating on local data, results in high performance and low power consumption. Pro- cessed data can be inspected and read from the vision chip if nec- essary. Critically for low power operation, the no-salient-event scenario is detected prior to the data being read-out from the vi- sion sensor chip to the external microprocessor, removing the nugatory processing activity. The development of vision chips has accelerated over the past decade, with on focal-plane processing capability steadily increas- ing [2–7]. Coupled with low power system design, some of these devices have been incorporated into a wider solution to perform image processing [8,9]. Utilising a vision chip can realise complete systems with power consumptions from 2 to 100 mW, significantly lower than the 16 mW for 1 fps processing [10] to several 100 mW [11] needed by systems making use of conventional image sensor arrays. Such vision chip based systems can be the conduits toward vision solutions running in perpetuity on scavenged power from compact solar, wind and other renewable energy sources. Importantly, where there is a requirement for a sub-10 mW imaging system, it does not necessarily imply that a vision chip is a necessity. A PIR-sensor triggered conventional imaging system can be made to operate on <10 mW on average – the outcome of 1383-7621/$ - see front matter Ó 2013 Published by Elsevier B.V. http://dx.doi.org/10.1016/j.sysarc.2013.03.016 ⇑ Corresponding author. E-mail address: p.dudek@manchester.ac.uk (P. Dudek). Journal of Systems Architecture 59 (2013) 889–899 Contents lists available at SciVerse ScienceDirect Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc