Journal of Signal Processing Systems
https://doi.org/10.1007/s11265-019-01490-5
A High-Performance Dense Optical Flow Architecture Based
on Red-Black SOR Solver
Bibin Johnson
1
· Sachin Thomas
1
· Rani J. Sheeba
1
Received: 18 December 2018 / Revised: 31 August 2019 / Accepted: 9 October 2019
© Springer Science+Business Media, LLC, part of Springer Nature 2019
Abstract
Optical flow (OF ) is an integral part of many vision systems, especially in the embedded and mobile application with
ever-increasing challenges in achieving higher speed, minimal resource and lower power consumption. The work introduces
a Dense High Throughput Optical Flow (DHTOF) architecture based on a novel fast converging Red-Black Successive
Over Relaxation (RBSOR) solver architecture for computing dense and accurate OF using Horn and Schunck Optical
Flow (HSOF) algorithm from Full High Definition (FHD) frames in real-time. The DHTOF architecture can capture dense
OF from Ultra High Definition (UHD) frames at 48 Frames Per Second (FPS) with a throughput of 406 Megapixels/sec
achieving a Throughput Per Watt (TPW) of 43 Giga Operation Per Second Per Watt (GOPS/Watt). The superscalar
and deeply pipelined DHTOF architecture achieve same or lower Average Angular Error (AAE) with ≈ 4× lesser
number of RBSOR solver iterations as compared to the prior HSOF implementations based on Jacobi solver. It consumes
12.5× lesser resources and 29.3% lower power for FHD resolution when compared to prior architectures. The proposed
DHTOF architecture achieves highest area delay normalized speedup (at least by 28.2×) among the state of the art HSOF
architectures. The successful evaluation of the proposed architecture for real-time OF sensor is demonstrated in Xilinx
Virtex-VC707 Field Programmable Gate Array (FPGA) evaluation board.
Keywords Optical flow · Horn and Schunck · Red Black SOR · FPGA · Real-time
1 Introduction
Motion estimation plays an important role in scene
understanding and in pursuing higher-level cognitive tasks.
The advent of HSOF [1] algorithm helps to retrieve the
apparent motion of pixels from video and image sequences.
It computes OF as a global minimization of the cost
functional using the calculus of variations [2]. The cost
functional is formulated as a weighted average of the OF
Bibin Johnson
bibinjohnson.13@iist.ac.in
Sachin Thomas
sachinthomas1995@gmail.com
Rani J. Sheeba
sheeba@iist.ac.in
1
Department of Avionics, Indian Institute of Space Science
and Technology, Trivandrum, India
constraint and global smoothness constraint. Most of the
dense and highly accurate OF algorithms in the literature
are based on HSOF algorithm [3]. The HSOF algorithm
finds a lot of application ranging from vision aided robots to
unmanned aerial vehicles. The realization of such a system
requires high-speed computation of dense and accurate
flow vectors with deterministic latency and low power
consumption. But the sequential nature of the iterative
solver in HSOF algorithm leads to large processing time for
evaluating the sparse system of equations until the required
accuracy is obtained.
There is no existing literature for the real-time imple-
mentation of HSOF algorithm on a single core Central
Processing Unit (CPU) for computing dense and accurate
OF from UHD frames. Whereas there are few works which
illustrate the real-time implementations of HSOF algorithm
on General Purpose Graphic Processing Unit (GPGPU) for
lower resolution images. Christopher et al. [4] proposed a
modified HSOF algorithm on NVIDIA GeForce 7800 GS
and GeForce Go 7900 GTX GPGPU to compute dense
OF from Yosemite image (316 × 252) sequence at about