Journal of Signal Processing Systems https://doi.org/10.1007/s11265-019-01490-5 A High-Performance Dense Optical Flow Architecture Based on Red-Black SOR Solver Bibin Johnson 1 · Sachin Thomas 1 · Rani J. Sheeba 1 Received: 18 December 2018 / Revised: 31 August 2019 / Accepted: 9 October 2019 © Springer Science+Business Media, LLC, part of Springer Nature 2019 Abstract Optical flow (OF ) is an integral part of many vision systems, especially in the embedded and mobile application with ever-increasing challenges in achieving higher speed, minimal resource and lower power consumption. The work introduces a Dense High Throughput Optical Flow (DHTOF) architecture based on a novel fast converging Red-Black Successive Over Relaxation (RBSOR) solver architecture for computing dense and accurate OF using Horn and Schunck Optical Flow (HSOF) algorithm from Full High Definition (FHD) frames in real-time. The DHTOF architecture can capture dense OF from Ultra High Definition (UHD) frames at 48 Frames Per Second (FPS) with a throughput of 406 Megapixels/sec achieving a Throughput Per Watt (TPW) of 43 Giga Operation Per Second Per Watt (GOPS/Watt). The superscalar and deeply pipelined DHTOF architecture achieve same or lower Average Angular Error (AAE) with 4× lesser number of RBSOR solver iterations as compared to the prior HSOF implementations based on Jacobi solver. It consumes 12.5× lesser resources and 29.3% lower power for FHD resolution when compared to prior architectures. The proposed DHTOF architecture achieves highest area delay normalized speedup (at least by 28.2×) among the state of the art HSOF architectures. The successful evaluation of the proposed architecture for real-time OF sensor is demonstrated in Xilinx Virtex-VC707 Field Programmable Gate Array (FPGA) evaluation board. Keywords Optical flow · Horn and Schunck · Red Black SOR · FPGA · Real-time 1 Introduction Motion estimation plays an important role in scene understanding and in pursuing higher-level cognitive tasks. The advent of HSOF [1] algorithm helps to retrieve the apparent motion of pixels from video and image sequences. It computes OF as a global minimization of the cost functional using the calculus of variations [2]. The cost functional is formulated as a weighted average of the OF Bibin Johnson bibinjohnson.13@iist.ac.in Sachin Thomas sachinthomas1995@gmail.com Rani J. Sheeba sheeba@iist.ac.in 1 Department of Avionics, Indian Institute of Space Science and Technology, Trivandrum, India constraint and global smoothness constraint. Most of the dense and highly accurate OF algorithms in the literature are based on HSOF algorithm [3]. The HSOF algorithm finds a lot of application ranging from vision aided robots to unmanned aerial vehicles. The realization of such a system requires high-speed computation of dense and accurate flow vectors with deterministic latency and low power consumption. But the sequential nature of the iterative solver in HSOF algorithm leads to large processing time for evaluating the sparse system of equations until the required accuracy is obtained. There is no existing literature for the real-time imple- mentation of HSOF algorithm on a single core Central Processing Unit (CPU) for computing dense and accurate OF from UHD frames. Whereas there are few works which illustrate the real-time implementations of HSOF algorithm on General Purpose Graphic Processing Unit (GPGPU) for lower resolution images. Christopher et al. [4] proposed a modified HSOF algorithm on NVIDIA GeForce 7800 GS and GeForce Go 7900 GTX GPGPU to compute dense OF from Yosemite image (316 × 252) sequence at about