CamOdoCal: Automatic Intrinsic and Extrinsic Calibration of a Rig with Multiple Generic Cameras and Odometry Lionel Heng, Bo Li, and Marc Pollefeys Computer Vision and Geometry Lab, ETH Z¨ urich, Switzerland Abstract— Multiple cameras are increasingly prevalent on robotic and human-driven vehicles. These cameras come in a variety of wide-angle, ﬁsh-eye, and catadioptric models. Furthermore, wheel odometry is generally available on the vehicles on which the cameras are mounted. For robustness, vision applications tend to use wheel odometry as a strong prior for camera pose estimation, and in these cases, an accurate extrinsic calibration is required in addition to an accurate intrinsic calibration. To date, there is no known work on automatic intrinsic calibration of generic cameras, and more importantly, automatic extrinsic calibration of a rig with multiple generic cameras and odometry. We propose an easy-to-use automated pipeline that handles both intrinsic and extrinsic calibration; we do not assume that there are overlapping ﬁelds of view. At the begining, we run an intrinsic calibration for each generic camera. The intrinsic calibration is automatic and requires a chessboard. Subsequently, we run an extrinsic calibration which ﬁnds all camera-odometry transforms. The extrinsic calibration is unsupervised, uses natural features, and only requires the vehicle to be driven around for a short time. The intrinsic parameters are optimized in a ﬁnal bundle adjustment step in the extrinsic calibration. In addition, the pipeline produces a globally-consistent sparse map of landmarks which can be used for visual localization. The pipeline is publicly available as a standalone C++ package. I. INTRODUCTION There has been an explosive growth in the use of cameras on robotic and human-driven vehicles. From the robotic perspective, a camera offers a rich source of visual informa- tion which greatly enhances robot perception in contrast to lidar line scanners; recent advances in computing hardware facilitate real-time image processing. From the automotive perspective, multiple cameras are useful for driver assistance applications which help improve road safety. Image pro- cessing applications utilizing multiple cameras on a vehicle require both an accurate intrinsic calibration for each camera and an accurate extrinsic calibration. An accurate intrinsic camera calibration consists of an optimal set of parameters for a camera projection model that relates 2D image points to 3D scene points; these optimal parameters correspond to minimal reprojection error. An accurate extrinsic calibration corresponds to accurate camera poses with respect to a reference frame on the vehicle, usually the odometry frame. Accurate calibration allows feature points from one camera to be reprojected into another camera with low reprojection errors, and furthermore, odometry data can be used in conjunction with the camera extrinsics to efﬁciently compute a good initial estimate of the camera poses which can then be reﬁned via local bundle adjustment with minimal correction. We use the uniﬁed projection model described by Mei et al. in [1] which works well in practice for regular, wide- angle, ﬁsh-eye, and catadioptric cameras. This model differs from the uniﬁed Taylor model proposed by Scaramuzza et al. [2] in the aspect that Mei’s model is of a parametric form and explicitly models a generic camera while Scaramuzza’s model is represented by an arbitrary Taylor polynomial. As a result, a closed-form Jacobian matrix can be formulated using Mei’s model but not for Scaramuzza’s model. Contrary to claims that Mei’s model applied to ﬁsh-eye cameras has limited accuracy, we ﬁnd from experimental data that Mei’s model works very well for ﬁsh-eye cameras in practice. Our pipeline has two stages: intrinsic and extrinsic cal- ibration. In the intrinsic calibration stage, we require a large chessboard on which all squares have an equal known dimension. For each camera, with the chessboard held in a wide variety of poses, we use an automatic chessboard corner detector to ﬁnd all interior corners on the chessboard in every image until we accumulate a minimum number of corners. We then use the set of estimated corner coordinates to ﬁnd the parameters to Mei’s model by generating an initial esti- mate of the intrinsic parameters and reﬁning the parameters via non-linear optimization; an analytical Jacobian is used to signiﬁcantly speed up the optimization. In the extrinsic calibration stage, we separately run monoc- ular VO with sliding window bundle adjustment for each camera. It is possible for the VO to break occasionally in poorly-textured areas. Nevertheless, we use all sets of VO estimates to ﬁnd an initial estimate of the camera-odometry transform for each camera; each set of VO estimates has a different scale. We triangulate the inlier feature point correspondences generated by monocular VO using the initial camera-odometry transforms and odometry data. The result- ing sparse map is then optimized via bundle adjustment; the odometry poses are kept ﬁxed while all 3D scene points and the camera-odometry transforms are optimized. At this point, the camera-odometry transforms become more accurate; however, they are not sufﬁciently accurate for reprojection of feature points from one camera to another camera with sub- pixel reprojection errors. To solve this issue and still adhere to the assumption of no overlapping ﬁelds of views, we ﬁnd feature point correspondences across different cameras. Starting from the ﬁrst odometry pose, we maintain a local frame history for each camera, and we ﬁnd feature point correspondences between each camera’s current frame and every frame in every other camera’s frame history. For each frame pair, we rectify the two images on a common image 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November 3-7, 2013. Tokyo, Japan 978-1-4673-6357-0/13/$31.00 ©2013 IEEE 1793