Local Stereo Matching with 3D Adaptive Cost Aggregation for Slanted Surface Modeling and Sub-pixel Accuracy Yilei Zhang University of Alberta yilei@cs.ualberta.ca Minglun Gong Memorial Univ. of Newfoundland gong@cs.mun.ca Yee-Hong Yang University of Alberta yang@cs.ualberta.ca Abstract This paper presents a new local binocular stereo algorithm which takes into consideration plane fitting at the per-pixel level. Two disparity calculation passes are used. The first pass assumes that surfaces in the scene are fronto-parallel and generates an initial disparity map, from which the disparity plane orientations of all pixels are extracted and refined. In the second pass, the cost aggregation for each pixel is conducted along the estimated disparity plane orientations, rather than the fronto-parallel ones. Large window size with adaptive support weights is used to ensure the effectiveness of the slanted surface modeling. The disparity search space is also quantized at sub-pixel level to improve the accuracy of the disparity results. The experimental results demonstrate the validity of our presented approach. 1 Introduction The binocular stereo matching problem has been extensively studied in the past few decades because of its many applications. As well, the evaluation method popularized by Scharstein and Szeliski [3] has also contributed to the increase in attention to this problem. Optimization techniques used in stereo matching algorithms can be classified into global and local optimization. Although global optimization methods in general give better results than local ones, the speed and parallelism advantage of local techniques keeps research in local techniques thriving. Among all the local stereo algorithms, the ones based on adaptive-weight cost aggregation [5, 7] give the best performance. Conventional adaptive-window cost aggregation techniques focus on varying the size, shape, and position of the support window, whereas the adaptive-weight method [7] uses a large fixed-size support window and assigns a support weight to each pixel in the window. The weight is calculated based on Gestalt Principles, which state that the grouping of pixels should be based on spatial proximity and chromatic similarity. The segment-based adaptive- weight method [5] improves upon the original adaptive-weight approach by first applying color segmentation, then assigning full support weights to pixels in the same segment with the pixel of interest. These adaptive-weight techniques are computationally intensive, since the window must be big enough for the aggregation to be effective. Nevertheless, due to parallelism, these methods can be speeded up if ported to programmable graphics hardware [1, 8]. According to the Middlebury stereo evaluation site [9], the best among all stereo algorithms are based on disparity plane fitting [2, 6]. These approaches first over segment the image into small homogeneously- colored regions, then apply plane-fitting technique to find candidate disparity planes for each segment. The optimal disparity plane assignment is determined using either local [4] or global [2, 6] optimization. Since the fitted disparity planes naturally provide sub-pixel disparity values, the scene can be reconstructed at a much finer level. Inspired by both categories of algorithms, we hereby propose a new local stereo approach, which introduces per-pixel non-fronto-parallel disparity plane modeling and performs adaptive-weight cost aggregation in 3D cost volume along slanted planes. 2 The proposed algorithm The workflow of the proposed algorithm is described in Figure 1. In the first pass, the algorithm computes an initial disparity map using a GPU-based adaptive-weight stereo matcher [1]. Then, a disparity plane orientation (DPO) image which encodes the gradient of the disparity plane at each pixel is extracted using a simple least squares fitting approach. With estimated per-pixel DPO information, a new 3D adaptive cost aggregation approach is used in the second pass for generating disparity results at sub-pixel accuracy. Finally, to refine the result, the disparity maps obtained for the two views are cross-checked to remove inconsistent disparity values, which are later filled-in using a DPO-based hole-filling approach. Due to space limit, we refer the readers to [1] for the details of the first step. The remaining steps are discussed in the rest of this section. The experimental results are presented and discussed in section 3. Then we conclude the paper in section 4. 978-1-4244-2175-6/08/$25.00 ©2008 IEEE