International Journal of Computer Vision. vol.47, no.1/2/3, pp.99-117, May 2002. Fast Stereo Matching Using Rectangular Subregioning and 3D Maximum-Surface Techniques Changming Sun CSIRO Mathematical and Information Sciences Locked Bag 17, North Ryde, NSW 1670, Australia changming.sun@csiro.au Abstract This paper presents a fast and reliable stereo match- ing algorithm which produces a dense disparity map by using fast cross correlation, rectangular subregion- ing (RSR) and 3D maximum-surface techniques in a coarse-to-fine scheme. Fast correlation is achieved by using the box-filtering technique whose speed is invari- ant to the size of the correlation window and by seg- menting the stereo images into rectangular subimages at different levels of the pyramid. By working with rect- angular subimages, not only can the speed of the cor- relation be further increased, the intermediate memory storage requirement can also be reduced. The dispar- ity map for the stereo images is found in the 3D cor- relation coefficient volume by obtaining the global 3D maximum-surface rather than simply choosing the po- sition that gives the local maximum correlation coeffi- cient value for each pixel. The 3D maximum-surface is obtained using our new two-stage dynamic program- ming (TSDP) technique. There are two original con- tributions in this paper: (1) development of the RSR technique for fast similarity measure; and (2) devel- opment of the TSDP technique for efficiently obtaining 3D maximum-surface in a 3D volume. Typical running time of our algorithm implemented in the C language on a 512×512 image is in the order of a few seconds on a 500MHz PC. A variety of synthetic and real images have been tested, and good results have been obtained. Keywords: Rectangular subregioning (RSR), Fast cross-correlation, Similarity measure, Stereo matching, Coarse-to-fine, Pyramid, 3D Maximum-Surface, Two- stage dynamic programming (TSDP), Sub-pixel accu- racy. 1 Introduction The correspondence problem in stereo vision and pho- togrammetry concerns the matching of points or other kinds of primitives such as edges and regions in two or more images (in this paper, we just use two images) such that the matched image points are the projections of the same point in the scene. The disparity map ob- tained from the matching stage may then be used to compute the 3D positions of the scene points given the imaging geometry. Because of factors such as noise, lighting variation, occlusion and perspective distortion, the appearances of the corresponding points will differ in the two im- ages. For a particular feature or a local window in one image, there are usually several matching candidates in the other image. It is usually necessary to use ad- ditional information or constraints to assist in obtain- ing the correct match. Some of the commonly used constraints are: (1) Epipolar constraint: Under this constraint, the matching points must lie on the corre- sponding epipolar lines of the two images. For epipolar rectified images, the matching points lie on the same image scanlines of a stereo pair; (2) Uniqueness con- straint: Matching should be unique between the two images; (3) Smoothness constraint: Local regions of the disparity map should be relatively smooth apart from regions with occlusion or disparity discontinu- ity; and (4) Ordering constraint or monotonicity con- straint: For points along the epipolar line in one image of the image pair, the corresponding points have to oc- cur in the same order on the corresponding epipolar line in the other image. In this paper, we assume that we work on the epipolar rectified stereo images so we essen- tially used the epipolar constraint. Other constraints mentioned will be used in the dynamic programming stage when obtaining the disparity map from the cor- relation coefficient volume. Matching techniques can be divided broadly into 1