Proceedings zyxwvutsrqponm 01 the 2003 IEEE International Conference zyxwvutsrqpon 00 Robotics zyxwvutsrqponm & Automation Taipei, Taiwan, September 14-19, 2003 zyxwvutsrqponm Registration and Segmentation for 3D Map Building A Solution based on Stereo Vision and Inertial Sensors Jorge Lobo, Luis Almeida, Jo5o Alves and Jorge Dias Institute of Systems and Robotics, University of Coimhra, Portugal, {jlobo:laaJalves,jorge)@isr .uc.pt zyxwvu Abstruct- This article presents a technique for registration and segmentation of dense depth maps provided by a stereo vision system. The vision system uses inertial sensors to give a ref- erence for camera pose. The maps are registered dsing a modified version of the ICP - Iterative Clos- est Point algorithm to register dense depth maps obtained from a stereo vision system. The pro- posed technique explores the integration of iner- tial sensor data for dense map registration. Depth maps obtained by vision systems, are very point of view dependent, providing discrete layers of detected depth aligned with the camera. In this work we use inertial sensors to recover camera pose, and rectify the maps to a reference ground plane, enabling the segmentation of ver- tical and horizontal geometric features and map registration. We propose a real-time methodol- ogy to segment these dense depth maps, includ- ing segmentation of structures, object recogni- tion, robot navigation or any other task that re- quires a three-dimensional representation of the physical environment. The aim of this work is a fast real-time sys- tem, that can be applied to autonomous robotic systems or to automated car driving systems, for modelling the road, identifying obstacles and roadside features in real-time. I INTRODUCTION The registration of 3D surfaces has applications varying from building terrain maps, or depth maps of sea floor, for autonomous robots, to recognition of objects or to reconciling various medical irnag- ing modalities. This article describes a technique for dense maps registration based on data from iner- tial sensors and depth maps provided from a stereo vision system, within the context of autonomous robots. The field of robot mapping and localization has been a very active domain but only a few ap- plications have used dense data from vision sensors. Recently many computer vision researchers explored 0-7803-7736-2/03/$17.00 02003 IEEE techniques to combine images obtained from differ- ent points-of-view, but, the relation between tech- niques from both domains is still somewhat nnder- exploited [1][2]. This article proposes a technique suitable for robot mapping based on data obtained by computer vision techniques. One of the very im- portant tasks in computer vision is to extract depth information of the world. Stereoscopy is a technique to extract depth information from two images of a scene taken from different view points. This informa- tion can he integrated on a single entity called dense depth map. In humans and in animals the vestibu- lar system in the inner ear gives inertial information essential for navigation, onentation, body posture control and equilibrium. Neural interactions of hu- man vision and vestibular system occur at .a very early processing stage [3][4]. In this work we use the vertical reference provided by the inertial sensors to perform a fast segmentation of depth maps obtained from a stereo real time algorithm. In our previous work on inertial sensor data in- tegration in vision systems, the inertial data was di- rectly used with the image data [5][6][7]. In this work we use the inertial data to perform a fast segmenta- tion of pre-computed depth maps obtained from the vision system. The map registration technique pro- posed in this article uses a modified version of the ICP - IteratiweClosestPoint algorithm IS]. The technique is a modified approach that explores the inertial sensor data for dense map registration. The aim of stereo systems is to achieve an ade- quate throughput and precision to enable video-rate dense depth mapping. The throughput of a stereo machine can be measured by the product of the num- ber of depth measurements per second (pixellsec) and the range of disparity search (pixels); the former determines the density and speed of depth measure- ment and the later the dynamic range distance mea- surement [9][10][11][12]. The group of T. Kanade at CMU [13] succeeded in producing avidew rate stereo machine based on the multi-baseline stereo algorithm zy 139