Global Matching Criterion and Color Segmentation Based Stereo Hai Tao and Harpreet S. Sawhney Sarnoff Corporation, 201 Washington Rd., Princeton, NJ 08543 {htao, hsawhney}@sarnoff.com Abstract In this paper, we present a new analysis by synthesis computational framework for stereo vision. It is designed to achieve the following goals: (1) enforcing global visibility constraints, (2) obtaining reliable depth for depth boundaries and thin structures, (3) obtaining correct depth for textureless regions, and (4) hypothesizing correct depth for unmatched regions. The framework employs depth and visibility based rendering within a global matching criterion to compute depth in contrast with approaches that rely on local matching measures and relaxation. A color segmentation based depth representation guarantees smoothness in textureless regions. Hypothesizing depth from neighboring segments enables propagation of correct depth and produces reasonable depth values for unmatched region. A practical algorithm that integrates all these aspects is presented in this paper. Comparative experimental results are shown for real images. Results on new view rendering based on a single stereo pair are also demonstrated. 1 Introduction This paper deals with the problem of estimation of dense scene structure using a generalized stereo configuration of a pair of cameras. As is the norm in stereo vision, it is assumed that the intrinsic camera parameters and the exterior pose information are provided. In general, this information can also be derived from images but the focus of this work is on dense 3D extraction. Extraction of dense 3D structure involves establishing correspondence between the pair of images. A variety of methods that rely on image matching under various constraints have been developed in stereo vision. An excellent review of early stereo vision work can be found in [Dhond89]. Stereo matching has to deal with the problems of matching ambiguity, image deformations due to variations in scene structure, delineation of sharp surface boundaries, and unmatched regions due to occlusions/deocclusions in the two images. Typically in order to handle ambiguities in matching, window operations are performed to integrate information over regions larger than a pixel. This leads to the classical matching disambiguation versus depth accuracy trade-off. In areas with sufficient detail, small windows may provide enough matching information but matching over a larger range of depth variations (disparities) may not be possible due to ambiguous matches. In textureless areas small windows are inherently ambiguous. A common strategy for combating this problem is to enforce depth smoothness using techniques such as cooperative/competitive algorithms [Marr79, Zitnick99], multi-resolution schemes [Hanna93], graph methods [Roy98, Boykov99], and surface model fitting [Hoff89]. However, some of these techniques may introduce excessive smoothness that blurs the depth discontinuities and fails to capture details of the scene such as thin structures. Most of these algorithms face the classic depth smoothness and accuracy tradeoff in one way or the other. Using adaptive windows [Kanade94], more than two views [Okutomi93, Nakamura 96], and non-linear diffusion [Scharstein96] may alleviate this problem to some extent. foreground background background will be occlued in the second view point A foreground background match as foreground match as background reference view second view Figure 1. Invalid local matching in an occlusion region. Invalid matches occur in unmatched regions due to occlusions/deocclusions. As shown in Figure 1, since point A is occluded in the other image, the correspondence induced by the correct depth will have a low matching score because the background is matched against the foreground. Better match will be achieved for spurious depth. This implies that the occlusion regions, ideally, should be recognized in the depth computation process and treated differently. Many smoothness enforcing algorithms either ignore this fact or assume that spurious depths give low matching scores and iteratively detect the occlusion region. Efforts have also been made to detect occlusion boundaries in the preprocessing stage to guide the subsequent matching process. Geiger at al. [Geiger95], Belheumer et al. [Belhumeur96] and Intille et al. [Intille94] define objective functions that account for the co-occurrence of depth boundaries and unmatched regions. However, this is done within a one-dimensional ordering constraint along scan lines under a simplified stereo model. Therefore, typically these algorithms produce jagged surface boundaries since they do not enforce surface smoothness across scan lines. In some cases [e.g. Belhumeur96] 2D smoothness is enforced