Integrated Stereo Vision zyxw - A Multiresolution Apporach zy K. Sunil Kumar and U. B. Desai Department of Electrical Engineering Indian Institute of Technology - Bombay Bombay 400 076 (India) email zyxwv ubdesai@ee.iitb.ernet.in Abstract zyxwvuts The problem of stereo vision is an ill-posed problem in the sense zyxwvuts of Hadamard. Motivated by the work of Gamble et al, Clark and Yullie, Toborg and Hwang, and Barnard we formulate and setup the problem of stereo vision in an integrated multiresolution frame- work. The energy functions constructed are interac- tive in the sense that there zyxwvuts is feedforward as well as feedback of information between the different modules. Simulations show that there is considerable improve- ment in the obtained depth map and reduction in com- putational time when the modules are integrated and a multiresolution approach is used. 1 Introduction Extracting the third dimension from two 2-D inten- sity images is an inverse optics problem and hence is ill-posed in the sense of Hadamard. Regularization is used to restrict the solution space by imposing suit- able constraints and make a problem well-posed. A number of techniques exist in literature which have been developed to infer depth, [SI refers a few. Binoc- ular stereo techniques are able to determine the exact depth information from two 2-D images, provided the camera geometry is known unlike the various shape from X techniques. Though there exists a large number of stereo vi- sion algorithms, most of them can be categorized as having three main modules built into their structure. They are (a) the Feature Extraction module, (b) the Matching module and (c) the Interpolation module. Algorithms which do not use inte ration traverse interaction between the modules. A feature extractin algorithm is used to obtain salient features (eg. edges? in both the images of the stereo pair, then a matching algorithm is used to establish correspondence between feature points in the left and right images. The differ- ence between the position of the feature point in the left image and the matched point in the right image gives the disparity map and hence the depth. Next an interpolation algorithm is used to interpolate on the sparse depth data, to obtain the complete 3-D scene. The end result of transversing sequentially through the modules results in a poor reconstruction because through the three modules sequential p1 y, without any OThis work is supported by an MHRD (India) project on " Computer Vision " . of the fact that correct matches and dense disparity are conflicting, this neccesiciates the use of integration [6]. Integration is a process where various modules in- volved in the outcome of a problem work synergisti- cally. Each module works so as not only to achieve its objective but also takes into consideration the out- come of other modules, this helps in achieving the overall objective of the problem. The role of inte- gration has been shown to be effective and [8 - 191 carries a rich literature in this regard. A num zy 5 er of algorithms appear in vision literature which use in- tegration in one form or the other [l], [2], [3],[4], [9] (refer to [6] for some more). In this paper, motivated by the work of Gamble et z al [l], Clark and Yullie [9], Toborg and Hwang [2] and Barnard [4] we present a new approach to integrate stereo modules over scales. In [6] it is shown that inte- gration between the modules enhances the depth map obtained, compared to the case where no integration is used, but not unexpectedly the time taken to compute the depth map is high. This is due to the poor dis- parity initialization. In this paper we exploit the mul- tiresolution approach to overcome the bad disparity initialization. Moreover, this approach performs the integrated computations in much less time (at least seven and a half times faster [7]). The multiresolu- tion approach has been used quite extensively in the computer vision literature (for example [3], [4], and references in [SI). 2 Problem Formulation zm,,, represents the value of depth of the scene on a 2D lattice (P x P) at the point (m,n). Here P = 2"; R is an integer and 0 5 m, n 5 (P-1). Let x;,~, x&,~ be the 2-D intensity images of the scene z,~, obtained by the left and the right disparate camera respectively. The problem of stereo vision is to estimate zm,n given xh,n , x;,~, for 0 5 m,n 5 (P - 1) and the camera geometry. 2.1 zyxwv x : ; " is the intensity value of the (i, zyxwv j)th pixel in the right image at resolution fi Note that for zzA, i, z j can take values in the range (0, 2A-1). Similarly, z5" is the intensity of the left image at resolution R. A Brief on the Notation Used 7 14 1051-465U94 $04.00 0 1994 IEEE