Scene reconstruction and geometrical rectification from stereo images Antonio Javier GALLEGO SÁNCHEZ, Rafael MOLINA CARMONA, Carlos VILLAGRÁ ARNEDO Grupo de Informática Industrial e Inteligencia Artificial (I3A) Universidad de Alicante Ctra. San Vicente del Raspeig, S/N 03080 – Alicante (Spain) Tel. +34 965 90 3900 Fax. +34 965 90 3902 {ajgallego, rmolina, villagra}@dccia.ua.es ABSTRACT A system to reconstruct three-dimensional scenes from stereo images is presented. The reconstruction is based on a dense disparity image obtained by a process of window correlation, applying a geometrical rectification before generating a three- dimensional matrix which stores the spatial occupation. The geometrical rectification is essential to correct the conical perspective of the camera and to obtain real scenes. For the geometrical rectification, three approaches are proposed, based on one linear and two logarithmic functions. As a result, the rectification allows the system to adequately correct the reconstruction, showing that the best choice depends on the features of the original images. Keywords: stereoscopic vision, disparity images, geometrical rectification and three-dimensional scene reconstruction. 1. INTRODUCTION A central aspect nowadays in artificial intelligence research is the perception of the environment by artificial systems. Specifically, stereoscopic vision opens new paths that in future will allow these systems to capture the three-dimensional structure of the environments without any physical contact. For instance, a first solution to three-dimensional reconstruction with stereo technology is developed in Carnegie Mellon University. The possibility of composing several three- dimensional views from the camera transforms is set out, to build the so-called “3D evidence grid” [1] Most proposals in this field are based on disparity maps obtained by the extraction of scene characteristics, such as corners, borders, and so on, or this extraction can be done once the disparity map is obtained [2][3]. There are some solutions which use a background image to obtain objects and silhouettes, making a difference operation with the image to reconstruct the scene [4][5]. It is also habitual to use a general model or a priori knowledge to compare the result with, so that the reconstruction is built using this knowledge as a model. This is the case with the reconstruction of faces or known objects [6]. For instance, in [2] the author reconstructs some basic objects from a stereo image using primitives, such as cubes or boxes, and the objects are displayed in 3D. Moreover, several views of a scene or a sequence of them can be used to make the reconstruction [7]; or it can be based on other type of sensors, such as laser range sensors. All these algorithms cannot be applied in a general manner and their field of application is limited. Nevertheless, our proposal has some advantages, due to the fact that it does not make assumptions about the scene nor the object structure, no characteristics are extracted, it does not segment objects trying to find a known shape and a stereo pair only is used, not a sequence. The method that we propose reconstructs a three-dimensional scene from a dense disparity map (the map contains depth information for every pixel in the image) obtained from a binocular camera. It makes a geometrical rectification to show the same aspect as the real scene, removing the conical perspective (see section 3) that the images from a camera show. For instance, if the image of a corridor is reconstructed with no rectification, the walls, the floor and the ceiling would appear with a slope, seeming to converge at a point. In the second section a more detailed description of the problem is given, with special focus on stereo vision and disparity maps. The proposed model is explained in the third section, including several solutions. Experiments are shown in the fourth section and, finally, some conclusions and future works to be developed are presented. 2. PROBLEM DESCRIPTION Stereo vision techniques are based on the possibility of extracting three-dimensional information from a scene, using two or more images taken from different view points. We will focus on the basic case of two images belonging to a scene. The function of corresponding pixels from each image must be obtained. Let us consider that the camera objectives are parallel, so that the search area for each pixel in the image from the left camera (reference image) is reduced to the same row as the image from the right camera. Every pixel in this row, named epipolar line (figure 1), has a corresponding pixel in the other image, placed in a different column, due to the separation of cameras. The difference between the columns in absolute value is called disparity. The further the object is from the camera, the smaller the disparity, and vice versa.