Vol.:(0123456789) 1 3 Journal of Real-Time Image Processing https://doi.org/10.1007/s11554-018-0808-6 SPECIAL ISSUE PAPER A templated programmable architecture for highly constrained embedded HD video processing Mathieu Thevenin 1 · Michel Paindavoine 2 · Renaud Schmit 1 · Barthelemy Heyrman 2 · Laurent Letellier 1 Received: 11 December 2017 / Accepted: 18 July 2018 © Springer-Verlag GmbH Germany, part of Springer Nature 2018 Abstract The implementation of a video reconstruction pipeline is required to improve the quality of images delivered by highly constrained devices. These algorithms require high computing capacities—several dozens of GOPs for real-time HD 1080p video streams. Today’s embedded design constraints impose limitations both in terms of silicon budget and power con- sumption—usually 2 mm 2 for half a Watt. This paper presents the eISP architecture that is able to reach 188 MOPs/mW with 94 GOPs/mm 2 and 378 GOPs/mW using TSMC 65-nm integration technology. This fully programmable and modular architecture, is based on an analysis of video-processing algorithms. Synthesizable VHDL is generated taking into account diferent parameters, which simplify the architecture sizing and characterization. Keywords SIMD · VLIW · Programmable · Low silicon footprint · Low-power 1 Introduction The imagers encountered in mobile multimedia applications are currently developed as modules that integrate an optical system, a sensor, and processing electronics, with a cost not exceeding a few dollars [15]. Since current sensors have a 1/10 inch size, the complete size of the module does not exceed a few millimeters per side. Such low-cost modules are achieved by using extremely simplifed optics and by reducing silicon areas to the minimum requirement (gener- ally limited to 2 mm 2 for 500 mW of maximum electrical power consumption). Currently, the processing electronics is based upon dedicated components with a limited scope of algorithm diversity. This choice limits product diferentia- tion between the integrators, since they use the modules. An alternative approach, which we develop here, is to use programmable processing electronics. Furthermore, this approach allows the processing chain to be modifed without changing the hardware. The image reconstruction process- ing can be very computing resource intensive. Usually, it can be divided into four steps, as illustrated in Fig. 1. The frst step is noise correction. The second step is contrast and tone correction. The third step is color reconstruction, which provides a full-resolution three-color plane image from a raw image. It includes white balancing, based on the overall scene illumination. Finally, image enhancement is performed, which consists of color and contour adjustment according to the intended use. The resulting image can then be saved, compressed, transmitted over a network or used by an application. An estimation of computing resources required for a HD 1080p real-time processing chain is 77 GOPs. This estimation was obtained by means of a tool we developed [35], and the approach is explained further in Sect. 3. The objective of the study presented in this paper is to develop a programmable hardware architecture that can execute real-time image-processing chain and enhance- ment for HD 1080p videos— 1920 × 1080 24-bit true color pixels at 25 fps (52 Mpixels/s), consequently the proposed architecture: – is fully programmable; – reaches the required computing capacity to execute tar- geted algorithms; – copes with integration constraints (2–3 mm 2 silicon sur- face, power consumption under half a Watt); – operates transparently and without external control on the pixel stream. * Mathieu Thevenin Mathieu.Thevenin@cea.fr 1 CEA, LIST—CEA Saclay, Saclay, France 2 University of Burgundy, Burgundy, France