CamOdoCal: Automatic Intrinsic and Extrinsic Calibration of a Rig with Multiple Generic Cameras and Odometry Lionel Heng, Bo Li, and Marc Pollefeys Computer Vision and Geometry Lab, ETH Z¨ urich, Switzerland Abstract— Multiple cameras are increasingly prevalent on robotic and human-driven vehicles. These cameras come in a variety of wide-angle, fish-eye, and catadioptric models. Furthermore, wheel odometry is generally available on the vehicles on which the cameras are mounted. For robustness, vision applications tend to use wheel odometry as a strong prior for camera pose estimation, and in these cases, an accurate extrinsic calibration is required in addition to an accurate intrinsic calibration. To date, there is no known work on automatic intrinsic calibration of generic cameras, and more importantly, automatic extrinsic calibration of a rig with multiple generic cameras and odometry. We propose an easy-to-use automated pipeline that handles both intrinsic and extrinsic calibration; we do not assume that there are overlapping fields of view. At the begining, we run an intrinsic calibration for each generic camera. The intrinsic calibration is automatic and requires a chessboard. Subsequently, we run an extrinsic calibration which finds all camera-odometry transforms. The extrinsic calibration is unsupervised, uses natural features, and only requires the vehicle to be driven around for a short time. The intrinsic parameters are optimized in a final bundle adjustment step in the extrinsic calibration. In addition, the pipeline produces a globally-consistent sparse map of landmarks which can be used for visual localization. The pipeline is publicly available as a standalone C++ package. I. INTRODUCTION There has been an explosive growth in the use of cameras on robotic and human-driven vehicles. From the robotic perspective, a camera offers a rich source of visual informa- tion which greatly enhances robot perception in contrast to lidar line scanners; recent advances in computing hardware facilitate real-time image processing. From the automotive perspective, multiple cameras are useful for driver assistance applications which help improve road safety. Image pro- cessing applications utilizing multiple cameras on a vehicle require both an accurate intrinsic calibration for each camera and an accurate extrinsic calibration. An accurate intrinsic camera calibration consists of an optimal set of parameters for a camera projection model that relates 2D image points to 3D scene points; these optimal parameters correspond to minimal reprojection error. An accurate extrinsic calibration corresponds to accurate camera poses with respect to a reference frame on the vehicle, usually the odometry frame. Accurate calibration allows feature points from one camera to be reprojected into another camera with low reprojection errors, and furthermore, odometry data can be used in conjunction with the camera extrinsics to efficiently compute a good initial estimate of the camera poses which can then be refined via local bundle adjustment with minimal correction. We use the unified projection model described by Mei et al. in [1] which works well in practice for regular, wide- angle, fish-eye, and catadioptric cameras. This model differs from the unified Taylor model proposed by Scaramuzza et al. [2] in the aspect that Mei’s model is of a parametric form and explicitly models a generic camera while Scaramuzza’s model is represented by an arbitrary Taylor polynomial. As a result, a closed-form Jacobian matrix can be formulated using Mei’s model but not for Scaramuzza’s model. Contrary to claims that Mei’s model applied to fish-eye cameras has limited accuracy, we find from experimental data that Mei’s model works very well for fish-eye cameras in practice. Our pipeline has two stages: intrinsic and extrinsic cal- ibration. In the intrinsic calibration stage, we require a large chessboard on which all squares have an equal known dimension. For each camera, with the chessboard held in a wide variety of poses, we use an automatic chessboard corner detector to find all interior corners on the chessboard in every image until we accumulate a minimum number of corners. We then use the set of estimated corner coordinates to find the parameters to Mei’s model by generating an initial esti- mate of the intrinsic parameters and refining the parameters via non-linear optimization; an analytical Jacobian is used to significantly speed up the optimization. In the extrinsic calibration stage, we separately run monoc- ular VO with sliding window bundle adjustment for each camera. It is possible for the VO to break occasionally in poorly-textured areas. Nevertheless, we use all sets of VO estimates to find an initial estimate of the camera-odometry transform for each camera; each set of VO estimates has a different scale. We triangulate the inlier feature point correspondences generated by monocular VO using the initial camera-odometry transforms and odometry data. The result- ing sparse map is then optimized via bundle adjustment; the odometry poses are kept fixed while all 3D scene points and the camera-odometry transforms are optimized. At this point, the camera-odometry transforms become more accurate; however, they are not sufficiently accurate for reprojection of feature points from one camera to another camera with sub- pixel reprojection errors. To solve this issue and still adhere to the assumption of no overlapping fields of views, we find feature point correspondences across different cameras. Starting from the first odometry pose, we maintain a local frame history for each camera, and we find feature point correspondences between each camera’s current frame and every frame in every other camera’s frame history. For each frame pair, we rectify the two images on a common image 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November 3-7, 2013. Tokyo, Japan 978-1-4673-6357-0/13/$31.00 ©2013 IEEE 1793