A Design Methodology for the Next Generation Real-Time Vision Processors Jones Yudi Mori 1,2(B ) , Andr´ e Werner 1 , Arij Shallufa 1 , Florian Fricke 1 , and Michael H¨ ubner 1 1 ESIT - Embedded Systems for Information Technology, Ruhr-University Bochum, Bochum, Germany {Jones.MoriAlvesDaSilva,Andre.Werner-w2m,Arij.Shallufa, Florian.Fricke,Michael.Huebner}@rub.de 2 Department of Mechanical Engineering, University of Bras´ ılia, Bras´ ılia, Brazil Abstract. In this work we present a methodology to design the next generation of real-time vision processors. These processors are expected to achieve high throughput with complex applications, under real-time embedded constraints (time, fault-tolerance, silicon area and power con- sumption). To achieve these goals, we propose the fusion of two key concepts: the Focal-Plane Image Processing (FPIP) and the Many-Core architectures. We show the concepts and ideas to build-up a methodology able to offer both design space exploration, and a customized program- ming toolchain for the final architecture. We present implementation details and results for working parts of the framework, and partial results and general comments about the work-in-progress. Keywords: ASIP · Image processing · Processor architecture · Real-time 1 Introduction Smart Cameras are special cameras which do not only acquire, compress and transmit images, but are capable of processing them to extract useful informa- tion. Complete IP/CV (Image Processing and Computer Vision) applications should be executable in modern Smart Cameras. With the growing of the Inter- net of Things (IoT) and the CyberPhysical Systems (CPS), a single device will be expected to run several complex applications simultaneously. A real-time IP/CV system is composed by two main parts: acquisition and processing. The acquisition part is in general a standard cmos sensor array which provides a pixel stream and some synchronization signals. The main problem in standard acquisition systems is the bottleneck in the pixel stream, since the pixels are transmitted one by one [6]. The hardware architectures commonly used in the processing part (DSP concepts, VLIW, SIMD operations) are not able to achieve the constraints in throughput, fault tolerance, silicon area and power consumption [12]. c Springer International Publishing Switzerland 2016 V. Bonato et al. (Eds.): ARC 2016, LNCS 9625, pp. 14–25, 2016. DOI: 10.1007/978-3-319-30481-6 2