Relative Pose Calibration Between Visual and Inertial Sensors Jorge Lobo and Jorge Dias Institute of Systems and Robotics University of Coimbra 3030-290 Coimbra, Portugal {jlobo,jorge}@isr.uc.pt Abstract— This paper proposes an approach to calibrate off-the-shelf cameras and inertial sensors to have a useful integrated system to be used in static and dynamic situations. The rotation between the camera and the inertial sensor can be estimated, when calibrating the camera, by having both sensors observe the vertical direction, using a vertical chessboard target and gravity. The translation between the two can be estimated using a simple passive turntable and static images, provided that the system can be adjusted to turn about the inertial sensor null point in several poses. Simulation and real data results are presented to show the validity and simple requirements of the proposed method. Index Terms— computer vision, inertial sensors, sensor fusion, calibration. I. I NTRODUCTION Inertial sensors coupled to cameras can provide valuable data about camera ego-motion and how world features are expected to be oriented. Object recognition and tracking benefits from both static and inertial information. Several human vision tasks rely on the inertial data provided by the vestibular. Artificial system should also exploit this sensor fusion. In our previous work we explored some of the benefits of combining the two sensing modalities, and how gravity can be used as a vertical reference [1][2]. We now focus on how the two sensors can be cross-calibrated so that they can be used in static and dynamic situations. The rotation between the camera and the inertial sensor can be estimated by having both sensors observe the verti- cal direction, using a vertical visual target for the camera, and gravity for the inertial sensors. Standard camera cali- bration can be performed on the same set of images, both using the same visual target, such as a vertical chessboard target, simplifying the whole calibration procedure. The translation between the two will not be important in some applications, but if the inertial sensor is attached to the camera system with a significant lever arm, it will have to be taken into account for fast motions. Using a simple passive turntable, and positioning the integrated camera and inertial system centered on the inertial sensor, the lever arm can be estimated. Observing the inertial sensor outputs, the system can be adjusted to turn about their null point in several poses. The lever arm can than be estimated from static images of a suitably placed visual target before and after each rotation. The problem of estimating the rotation between the inertial sensor and the camera is a particular case of the well-known orthogonal Procrustes method for 3D attitude estimation [3]. Instead of having two sets of points we have two sets of unit vectors corresponding to the observed vertical in each sensor at several poses. In our work we used the unit quaternion derivation of the method [4]. Standard hand-eye calibration [5][6] can be applied to estimate translation, using the approach of rotating about the inertial sensor center. However, since the target is being repositioned after each turn, the method is not applied to the full data set like in traditional hand-eye calibration. We used an implementation of the full hand-eye calibration [5] to provide a comparison in the results using only a camera with fixed lever arm, by keeping a constant pivot point. II. STAND ALONE SENSOR CALIBRATION A. Camera Calibration Camera calibration has been extensively studied, and standard techniques established. For this work camera calibration was performed using the Camera Calibration Toolbox for Matlab [7]. The C implementation of this toolbox is included in the Intel Open Source Computer Vision Library [8]. The calibration uses images of a chessboard target in several positions and recovers the camera’s intrinsic pa- rameters, as well as the target positions relative to the camera. The calibration algorithm is based on Zhang’s work in estimation of planar homographies for camera calibration [9], but the closed-form estimation of the inter- nal parameters from the homographies is slightly different, since the orthogonality of vanishing points is explicitly used and the distortion coefficients are not estimated at the initialization phase. The calibration toolbox was also used to recover camera extrinsic parameters in the subsequent relative pose cali- bration. B. Inertial Sensor Calibration Inertial navigation systems also have established cali- bration techniques, but rely on high-end sensors and ac- tuators. Nevertheless, in order to use off-the-shelf inertial sensors attached to a camera, appropriate modelling and calibration techniques are required. Some of the inertial