3D Pose Estimation and Mapping with Time-of-Flight Cameras Stefan May, David Droeschel, Dirk Holz and Christoph Wiesen Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) Schloss Birlinghoven 53754 Sankt Augustin, Germany stefan.may@iais.fraunhofer.de Stefan Fuchs German Aerospace Center (DLR) Institute of Robotics and Mechatronics 82234 Wessling, Germany stefan.fuchs@dlr.de Abstract— This paper presents a method for precise 3D en- vironment mapping. It employs only a 3D Time-of-Flight (ToF) camera and no additional sensors. The camera pose is estimated using visual odometry. Imprecision of depth measurements caused by external interfering factors, e.g. sunlight or reflectiv- ities are properly handled by several filters. Pose tracking and mapping is performed on-the-fly during exploration and allows even for hand-guided operation. The final refinement step, comprising error distribution after loop-closure and surface smoothing, further increases the precision of the resulting 3D map 1 . I. INTRODUCTION Since their invention nearly a decade ago, Time-of-Flight (ToF) cameras have attracted attention in many fields, e.g. au- tomotive engineering, industrial engineering, mobile robotics and surveillance. So far, 3D laser scanners and stereo camera systems are mostly used for these tasks due to their high measurement range and precision. Stereo vision requires the matching of corresponding points from two images to obtain depth information, which is directly provided by laser scanners but with the drawback of a lower frame rate. In con- trast to laser scanners, ToF cameras allow for higher frame rates and thus enable the consideration of motion. However, the high frame rate has to be balanced with measurement precision. Depending on external interfering factors (e.g. sun- light) and scene configurations, e.g. distances, orientations and reflectivities, the same scene entails large fluctuations in distance measurements from different perspectives. These influences cause systematic errors besides noise and have to be handled by the application. As a result laser scanners are mostly used for mapping purposes, e.g. [14], [20], [3], [11], [19]. In this paper we present a mapping approach, which deals with large variations in precision of distance measurements. Mapping is performed on-the-fly with no additional sensory information about the sensor’s ego-motion. The approach comprises: feature based ego-motion estimation, filtering of imprecise data and registration of newly acquired data for a consistent 3D environment map. After loop-closure, a refinement step distributes the error and smoothes the measurements yielding in a precise 3D map. 1 A video, showing the performance of the approach, is available at http://www.iais.fraunhofer.de/3325.html (a) (b) Fig. 1. a) Scenario used for mapping. b) 3D cloud registered with data taken from a Swissranger SR-3k device (false color code relates distance to origin of coordinate system). The remainder of this paper is organized as follows: Sec- tion II elaborates 3D mapping approaches and applications related to ToF cameras. Section III describes ToF camera errors caused by external interfering factors. In Section IV our mapping approach including 3D pose estimation, error handling and mapping is represented. Section V illustrates experimental results that support our accentuation of em- ploying real-time capable ToF sensors to pose estimation and mapping tasks. Finally, section VI concludes with an outlook on future work. II. RELATED WORK Surface reconstruction is a basic task for object detection, manipulation and environment modelling. Generally, the object’s surface is reconstructed by merging measurements from different views. This approach requires depth data and sensor pose data. When both, pose and depth, are unknown, structure from motion is a solution. Corresponding features in consecutive images are used to estimate the ego-motion of the sensor. Based on this ego-motion information the depth without absolute scale is estimated. If only the depth information but no pose is given, i.e. by using a stereo camera or a laser scanner system without inertial sensors, the Iterative Closest Point (ICP) algorithm can be used to register point clouds acquired from different perspectives [2]. Finally, if pose and depth are known, the registration procedure is dispensable and the data can simply be merged. In any case, the quality of surface reconstruction depends on the precision of sensor pose estimation and depth measurement.