Unsupervised Intrinsic and Extrinsic Calibration of a Camera-Depth Sensor Couple Filippo Basso, Alberto Pretto and Emanuele Menegatti. Abstract— The availability of affordable depth sensors in conjunction with common RGB cameras (even in the same device, e.g. the Microsoft Kinect) provides robots with a com- plete and instantaneous representation of both the appearance and the 3D structure of the current surrounding environment. This type of information enables robots to safely navigate, perceive and actively interact with other agents inside the working environment. It is clear that, in order to obtain a reliable and accurate representation, not only the intrinsic parameters of each sensors should be precisely calibrated, but also the extrinsic parameters relating the two sensors should be precisely known. In this paper, we propose a human- friendly and reliable calibration framework, that enables to easily estimate both the intrinsic and extrinsic parameters of a camera-depth sensor couple. Real world experiments using a Kinect show improvements for both the 3D structure estimation and the association tasks. I. INTRODUCTION Typical robotic tasks like SLAM, navigation, object recog- nition and many others, highly benefit from having color and depth information fused together. While color information is almost always provided by RGB cameras, there are plenty of sensors able to provide depth information: time-of-flight (ToF) cameras, laser range scanners and sensors based on structured light. Even if there are some devices able to provide both color and depth data (e.g. the popular low- cost Microsoft Kinect, composed by two very close sensors), as far as we know, there are no integrated sensors able to provide both color and depth information yet. In this paper we focus on Kinect-like devices (among others, the Asus Xtion Pro Live). These sensors provide colored point clouds that suffer from a non accurate association between depth and RGB data, due to a non perfect alignment between the camera and the depth sensor. Moreover, depth images suffer from a geometric distortion, typically irregular and position dependent. Finally, we have noticed that for increasing distances, there is an increasing bias (i.e., a systematic error) in depth measurements. These devices are factory calibrated, so each sensor is sold with its own calibration parameter set stored inside a non-volatile memory. However, the quality of this calibration is only adequate for gaming purposes. This research has been partially supported by Telecom Italia SPA with the grant “Service Robotics”, by University of Padova with the grants “DVL-SLAM” and “TIDY-UP: Enhanced Visual Exploration for Robot Navigation and Object Recognition”, and by the European Com- mission under FP7-600890-ROVINA. Basso, Pretto and Menegatti are with the Department of Information Engineering, University of Padova, Italy. Email: {filippo.basso, emg}@dei.unipd.it. Pretto is also with the Department of Computer, Control, and Management Engi- neering “Antonio Ruberti“, Sapienza University of Rome, Italy. Email: pretto@dis.uniroma1.it (a) (b) Fig. 1. Some results of our calibration procedure: (a) The non perfect alignment between the camera and the depth sensor produce inaccuracies in the depth-color association (left point cloud). A better alignment obtained with our calibration procedure results in a more accurate association (right point cloud). (b) A point cloud of a planar surface (a wall) without depth distortion correction (top) and the same cloud after the application of the proposed undistortion map (bottom). Moreover, the depth distortion is not modeled in the factory calibration. A proper calibration method for robust robotics applications should precisely estimate the misalignment and both the systematic and distortion errors. We propose a novel calibration method that employs a simple data-collection procedure, that only needs a mini- mally structured environment, and that does not require any parameters tuning or a great interaction with the calibration software. Moreover, even if the principal targets of the method are the Kinect-like devices mentioned above, it is thought to be used also with, even non-close, heterogeneous camera-depth sensor couples. Given a calibrated camera and an uncalibrated depth sensor, the proposed method automatically infers the intrinsic pa- rameter set of the depth sensor and the alignment between the two sensors, i.e. the rigid body transformation that relates the two sensor frames. For the depth sensor, we employ an error model that