FAST VARIABLE CENTER-BIASED WINDOWING FOR
HIGH-SPEED STEREO ON PROGRAMMABLE GRAPHICS HARDWARE
Jiangbo Lu
∗,†
, Gauthier Lafruit
†
, and Francky Catthoor
∗,†
∗
Department of Electrical Engineering, University of Leuven, Belgium
†
Multimedia Group, IMEC, Kapeldreef 75, B-3001, Leuven, Belgium
ABSTRACT
We present a high-speed dense stereo algorithm that achieves both
good quality results and very high disparity estimation throughput on
the graphics processing unit (GPU). The key idea is a variable center-
biased windowing approach, enabling an adaptive selection of the
most suitable support patterns with varying sizes and shapes. As the
fundamental construct for variable windows, a truncated separable
Laplacian kernel approximation is proposed for the efficient pixel-
wise weighted cost aggregation. We also present a number of critical
optimization schemes to boost the real-time speed on GPUs. Our
method outperforms previous GPU-based local stereo methods and
even some methods using global optimization on the Middlebury
stereo database. Our optimized implementation completely running
on an Nvidia GeForce 7900 graphics card achieves over 605 million
disparity estimations per second (Mde/s) including all the overhead,
about 2.1 to 12.1 times faster than the existing GPU-based solutions.
Index Terms— Stereo vision, real-time dense stereo, GPGPU
1. INTRODUCTION
Depth from stereo is an important computer vision topic that has at-
tracted intensive research interests for decades. A substantial amount
of work has been done on stereo, which is systematically surveyed
and evaluated by Scharstein and Szeliski [1]. In general, casting
a stereo problem as a global optimization problem usually leads to
high quality disparity estimation results, but most of these global
techniques are too computationally expensive for online processing.
Real-time stereo applications today still largely rely on some local
methods together with a winner-takes-all (WTA) decision strategy.
Typically, local window-based approaches choose to aggregate
the matching cost over a given support window to increase the ro-
bustness to noise and texture variation. However, to obtain accurate
results at depth discontinuities as well as on homogeneous regions,
an appropriate support window for each pixel should be decided
adaptively. To this end, several local methods have been proposed.
For instance, Fusiello et al. [2] performed the correlation with nine
windows anchored at different points and retained the disparity with
the smallest matching cost. However, this method and its gener-
alized technique, i.e., shiftable windows [1] usually require a rela-
tively large number of candidate support windows to achieve good
estimation results, and moreover their box-filters cannot adequately
differentiate the impact of support pixels with different spatial lo-
cations. Recently, Yoon and Kweon [3] proposed a state-of-the-art
local window method yet at a very demanding computational cost,
where pixel-wise support-weights are defined using a Laplacian ker-
nel, and they modeled the grouping strength for each support pixel.
Nonetheless, solely resorting to local methods is not a cure-all
for achieving dense stereo at high video rate. In fact, until recently
software-only real-time stereo systems begin to emerge, which ex-
ploit assembly level instruction optimization using Intel’s MMX ex-
tension, but few CPU cycles are left to perform other tasks including
high-level interpretation of the stereo results. Harnessing some pow-
erful built-in features of the modern graphics processing unit (GPU),
Yang et al. first proposed a pyramid-shaped correlation kernel [4]
and small-scale adaptive support windows [5]. Though very impres-
sive disparity estimation throughput is obtained on GPUs, these tech-
niques cannot strike an optimal quality balance between homoge-
neous and heterogeneous regions. Later on, Gong and Yang [6] pro-
posed an image-gradient-guided correlation method with improved
accuracy, while still maintaining real-time speed on GPUs. Inspired
by [3], Wang et al. [7] recently introduced an adaptive aggregation
step in a dynamic-programming stereo framework. The high-quality
results are obtained by their complicated cost aggregation and global
optimization strategy, and a real-time speed is enabled by utilizing
the unique processing capabilities of both the CPU and the GPU.
This paper presents a novel stereo algorithm that is specially
designed to achieve the competitive disparity quality and the high-
speed execution on GPUs. At the heart of the proposed algorithm is
a variable center-biased windowing approach, enabling an adaptive
selection of the most suitable support patterns for different regions.
Our method is in spirit similar to the variable window approach [8],
but it is much faster by avoiding the costly dynamic programming.
Concerning the real-time speed, the proposed method is by far
the fastest among all these GPU-based approaches. The major con-
tributing factors are three-folds: 1) our highly efficient core stereo
processing, 2) a number of special implementation optimizations
on the GPU, and 3) upgrading to the advanced graphics hardware.
Completely running on an Nvidia GeForce 7900 graphics card, our
optimized implementation achieves over 605 million disparity esti-
mations per second (Mde/s), compared to a maximum speed of 289
Mde/s in [5], 117 Mde/s in [6], and 50 Mde/s on CPU+GPU in [7].
2. THE PROPOSED STEREO MATCHING ALGORITHM
Following the taxonomy in [1], our stereo algorithm contains three
major steps: matching cost computation, cost aggregation, and fi-
nally disparity selection. In the first step, a matching cost for ev-
ery possible disparity value of each pixel is computed. To suppress
the influence of mismatches during the subsequent cost aggregation
step, we adopt the truncated absolute difference (TAD) as the match-
ing cost measure. Similar to most local approaches, the proposed al-
gorithm places a key emphasis on the cost aggregation step to reduce
the ambiguity in matching, and we will therefore focus on this core
part for the remaining of this Section. In the last disparity selection
step, a local WTA optimization is performed at each pixel, simply
choosing the disparity associated with the minimum cost value. The
entire framework of our stereo algorithm is illustrated in Fig. 1.
The proposed cost aggregation step is composed of two parts: 1)
VI - 568 1-4244-1437-7/07/$20.00 ©2007 IEEE ICIP 2007