Local Stereo Matching with 3D Adaptive Cost Aggregation for Slanted
Surface Modeling and Sub-pixel Accuracy
Yilei Zhang
University of Alberta
yilei@cs.ualberta.ca
Minglun Gong
Memorial Univ. of Newfoundland
gong@cs.mun.ca
Yee-Hong Yang
University of Alberta
yang@cs.ualberta.ca
Abstract
This paper presents a new local binocular stereo
algorithm which takes into consideration plane fitting
at the per-pixel level. Two disparity calculation passes
are used. The first pass assumes that surfaces in the
scene are fronto-parallel and generates an initial
disparity map, from which the disparity plane
orientations of all pixels are extracted and refined. In
the second pass, the cost aggregation for each pixel is
conducted along the estimated disparity plane
orientations, rather than the fronto-parallel ones.
Large window size with adaptive support weights is
used to ensure the effectiveness of the slanted surface
modeling. The disparity search space is also quantized
at sub-pixel level to improve the accuracy of the
disparity results. The experimental results demonstrate
the validity of our presented approach.
1 Introduction
The binocular stereo matching problem has been
extensively studied in the past few decades because of
its many applications. As well, the evaluation method
popularized by Scharstein and Szeliski [3] has also
contributed to the increase in attention to this problem.
Optimization techniques used in stereo matching
algorithms can be classified into global and local
optimization. Although global optimization methods in
general give better results than local ones, the speed
and parallelism advantage of local techniques keeps
research in local techniques thriving.
Among all the local stereo algorithms, the ones
based on adaptive-weight cost aggregation [5, 7] give
the best performance. Conventional adaptive-window
cost aggregation techniques focus on varying the size,
shape, and position of the support window, whereas the
adaptive-weight method [7] uses a large fixed-size
support window and assigns a support weight to each
pixel in the window. The weight is calculated based on
Gestalt Principles, which state that the grouping of
pixels should be based on spatial proximity and
chromatic similarity. The segment-based adaptive-
weight method [5] improves upon the original
adaptive-weight approach by first applying color
segmentation, then assigning full support weights to
pixels in the same segment with the pixel of interest.
These adaptive-weight techniques are computationally
intensive, since the window must be big enough for the
aggregation to be effective. Nevertheless, due to
parallelism, these methods can be speeded up if ported
to programmable graphics hardware [1, 8].
According to the Middlebury stereo evaluation site
[9], the best among all stereo algorithms are based on
disparity plane fitting [2, 6]. These approaches first
over segment the image into small homogeneously-
colored regions, then apply plane-fitting technique to
find candidate disparity planes for each segment. The
optimal disparity plane assignment is determined using
either local [4] or global [2, 6] optimization. Since the
fitted disparity planes naturally provide sub-pixel
disparity values, the scene can be reconstructed at a
much finer level.
Inspired by both categories of algorithms, we
hereby propose a new local stereo approach, which
introduces per-pixel non-fronto-parallel disparity plane
modeling and performs adaptive-weight cost
aggregation in 3D cost volume along slanted planes.
2 The proposed algorithm
The workflow of the proposed algorithm is
described in Figure 1. In the first pass, the algorithm
computes an initial disparity map using a GPU-based
adaptive-weight stereo matcher [1]. Then, a disparity
plane orientation (DPO) image which encodes the
gradient of the disparity plane at each pixel is extracted
using a simple least squares fitting approach. With
estimated per-pixel DPO information, a new 3D
adaptive cost aggregation approach is used in the
second pass for generating disparity results at sub-pixel
accuracy. Finally, to refine the result, the disparity
maps obtained for the two views are cross-checked to
remove inconsistent disparity values, which are later
filled-in using a DPO-based hole-filling approach.
Due to space limit, we refer the readers to [1] for the
details of the first step. The remaining steps are
discussed in the rest of this section. The experimental
results are presented and discussed in section 3. Then
we conclude the paper in section 4.
978-1-4244-2175-6/08/$25.00 ©2008 IEEE