Microprocessors and Microsystems 52 (2017) 2–22 Contents lists available at ScienceDirect Microprocessors and Microsystems journal homepage: www.elsevier.com/locate/micpro System-level design space identiﬁcation for Many-Core Vision Processors Jones Yudi a,b,∗ , Carlos Humberto Llanos b , Michael Huebner a a Embedded Systems for Information Technology, Ruhr-University Bochum, Germany b Automation and Control Group, University of Brasilia, Brazil a r t i c l e i n f o Article history: Received 22 January 2017 Revised 19 May 2017 Accepted 24 May 2017 Available online 26 May 2017 a b s t r a c t The current main trends in the embedded systems area, the Cyber-Physical Systems (CPS) and the Internet-of-Things (IoT), are leveraging the development of complex, distributed, low-power, and high- performance embedded systems. An important feature needed in this new Era is the embedded intelli- gence enabling to locally process data and actuate over the environment, without the need of a remote central processing server. In this context, emerged the Smart Cameras: devices able to acquire images and apply sophisticated algorithms for different Image Processing and Computer Vision (IP/CV) applications. Both the technology convergence and the evolution of embedded systems to multi/many-core architec- tures allow envisioning future cameras as many-core systems able to eﬃciently explore the natural IP/CV parallelism to meet embedded application’s constraints, e.g. real-time, power consumption, silicon area, temperature management, fault tolerance, among others. In this work, we show the development of a Many-Core Vision Processor architecture, suitable for future Smart Cameras. In our design methodology, we analyze several aspects involved, from high-level application analysis down to ﬁne-grained operations and physical aspects (e.g. geometry and spatial distribution). The main analysis is performed using a Sys- temC/TLM2.0 simulator specially developed for this project. Silicon Area, Power Consumption and Timing estimations are also provided as results of an early Design-Space Exploration (DSE). Using these results we propose a ﬁrst complete architecture, which is implemented in an FPGA. Details about the hardware implementation are provided, as well as synthesis results. In comparison to other works, from the litera- ture, the implemented architecture shows the potential of the project developed in this work. © 2017 Elsevier B.V. All rights reserved. 1. Introduction The evolution of the microelectronics technology has enabled the industry to integrate billions of transistors on a single chip, following the well-known Moore’s Law prediction. This evolu- tion enabled the development of new ideas, products and whole new markets based on the embedded processing systems. Follow- ing this trend, some new concepts emerged some years ago: the Cyber-Physical Systems (CPS), the Ubiquitous Computing (UC) and the Internet of Things (IoT), which are expected to change the in- teraction between humans and the surrounding environment. The computing capabilities embedded in different devices, daily-life objects, buildings, and so on, will be distributed, ubiquitous and transparent to the users [1]. ∗ Corresponding author. E-mail addresses: jones.morialvesdasilva@rub.de, jonesyudi@unb.br, jones.morialvesdasilva@ruhr-uni-bochum.de (J. Yudi), llanos@unb.br (C. Humberto Llanos), michael.huebner@rub.de (M. Huebner). Distributed embedded devices and intelligent sensors will col- lect and process huge amounts of data, enabling the pervasive- ness of the computational environment [2]. These sensors, also known as Smart Sensors, are devices able to perform not only pre-processing algorithms but also more complex applications with embedded and stand-alone intelligence. The Smart Cameras, a par- ticular type of Smart Sensor, will be essential as devices able to capture and interpret the human behavior in an environment, among other events/objects. These future cameras must be able to perform complex applications simultaneously, coping with real- time constraints. Also, other important requisites must be fulﬁlled: energy consumption, chip temperature control, reliability, Quality of Service (QoS), data security, privacy, power management, cost (silicon area), and so on [3]. Image Processing and Computer Vision (IP/CV) applications are computationally costly, mainly due to the huge amount of data to be processed. New technologies are enabling the use of in- creasing resolutions, and new applications are demanding high- performance with tight deadlines. In this scenario, there are cam- http://dx.doi.org/10.1016/j.micpro.2017.05.013 0141-9331/© 2017 Elsevier B.V. All rights reserved.