Low-cost FPGA stereo vision system for real time disparity maps calculation Paolo Zicari, Stefania Perri, Pasquale Corsonello ⇑ , Giuseppe Cocorullo Department of Electronics Computer Science and Systems, University of Calabria, Rende, Italy article info Article history: Available online 27 February 2012 Keywords: Stereo vision FPGA Low-cost architecture VLSI implementation abstract Several applications demand efficient hardware implementations of stereo vision systems in order to fur- nish real time three-dimensional measurements. This paper proposes a complete fast low-cost stereo vision system that performs stereo image rectification with tangential and radial distortion removal, computes dense disparity maps using the Sum of Absolute Differences as the dissimilarity metric, and, finally, exploits a novel injective consistency check purpose-designed for eliminating unreliable disparity values. The proposed system has been realized and hardware tested for several images resolutions and dispar- ity ranges. When 1280 720 grayscale images are processed with the disparity range equal to 30, the system allows a frame rate up to 97 fps@89 MHz to be reached. It has been realized on a single low-cost XilinxVirtex-4 XC4VLX60 FPGA chip and it occupies 63 DSPs, 128 BRAMs and 15728 slices. Ó 2012 Elsevier B.V. All rights reserved. 1. Introduction Real time stereo vision is a very challenging research area involving different application fields like autonomous navigation systems, surveillance systems, people and object tracking. Inspired by the human vision, the distance of a generic point from a stereo camera is measured by calculating the disparity between its pro- jected points into the left and right captured images. Several kinds of software and hardware implementations have been proposed using, respectively, general purpose processors and parallel pipe- lined circuits [1–11]. Recent examples of efficient implementations of complete stereo vision systems are provided in [3–6,11]. The system proposed in [3] elaborates 640 480 images and achieves a 230 fps frame rate using the Hamming distance of Census trans- formed images to compute the disparity in a range of 64 pixels. In [4], disparity maps are computed through a local block-based matching technique, implemented on the Cell Broadband Engine (CBE). The CBE by Sony, Toshiba and IBM, is the processor used in the Playstation3(R) console. The Sum of Absolute Differences (SADs) metric was chosen for use in such implementation as it pro- vides a good balance between accuracy and speed, performing bet- ter than the sum of squared differences, and having a smaller computational complexity than normalized correlation metrics. The vision machine presented in [5] operates at more than 20 Hz using a hybrid architecture consisting of one dual-GPU card and one quad-core CPU. A flexible use of GPUs as well as multiple CPU-cores according to the actual structure of the algorithms of the different sub modules has proved to be an efficient way of lowering processing time and latency with only moderate implementation effort. In [6], a real-time disparity map computation module is realized exploiting a parallel-pipelined fuzzy inference system. The dispar- ity is computed by SAD and the false correspondences are elimi- nated by an original fuzzy system. The overall design has been realized on an Altera Stratix III EP3SL340H1152C3. Finally, in [11] a fast and accurate stereo vision algorithm is pro- posed for hardware-based systems. The disparity is computed in a range of 60 pixels merging the gradient-based census transform (GCT) performed over 5 5 windows of pixels and the SAD com- puted over windows of pixels with sizes ranging from 5 5 to 19 19. The hardware implementation carried out using a Stratix EP1S60 FPGA device for the smallest windows sizes, elaborates 750 400 pixels images at a 60 fps frame rate. Analyzing stereovision systems existing in literature and previ- ously cited, it can be observed that they were mainly inspired by ever-growing speed performances, thus often disregarding costs. On the contrary, in this paper reducing costs is considered as important as achieving high speed and therefore we propose a low-cost and fast hardware implementation of a complete stereo- vision system. The stereovision system presented here is implemented in a single XilinxVirtex-4 XC4VLX60 FPGA chip and reaches very high performances thanks to an accurate hardware design effort which joins the potentialities of the versatile dedicated hardware plat- form to efficient algorithmic and implementation choices. It is worth noting that, nowadays, the commercial price ratio between XC4VLX200 used in [3], or the Altera Stratix III used in [6], and XC4VLX60 chips used here is about 9:1. 0141-9331/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.micpro.2012.02.014 ⇑ Corresponding author. E-mail address: p.corsonello@unical.it (P. Corsonello). Microprocessors and Microsystems 36 (2012) 281–288 Contents lists available at SciVerse ScienceDirect Microprocessors and Microsystems journal homepage: www.elsevier.com/locate/micpro