CamOdoCal: Automatic Intrinsic and Extrinsic Calibration of a Rig
with Multiple Generic Cameras and Odometry
Lionel Heng, Bo Li, and Marc Pollefeys
Computer Vision and Geometry Lab, ETH Z¨ urich, Switzerland
Abstract— Multiple cameras are increasingly prevalent on
robotic and human-driven vehicles. These cameras come in
a variety of wide-angle, fish-eye, and catadioptric models.
Furthermore, wheel odometry is generally available on the
vehicles on which the cameras are mounted. For robustness,
vision applications tend to use wheel odometry as a strong
prior for camera pose estimation, and in these cases, an
accurate extrinsic calibration is required in addition to an
accurate intrinsic calibration. To date, there is no known work
on automatic intrinsic calibration of generic cameras, and
more importantly, automatic extrinsic calibration of a rig with
multiple generic cameras and odometry.
We propose an easy-to-use automated pipeline that handles
both intrinsic and extrinsic calibration; we do not assume
that there are overlapping fields of view. At the begining,
we run an intrinsic calibration for each generic camera. The
intrinsic calibration is automatic and requires a chessboard.
Subsequently, we run an extrinsic calibration which finds
all camera-odometry transforms. The extrinsic calibration is
unsupervised, uses natural features, and only requires the
vehicle to be driven around for a short time. The intrinsic
parameters are optimized in a final bundle adjustment step in
the extrinsic calibration. In addition, the pipeline produces a
globally-consistent sparse map of landmarks which can be used
for visual localization. The pipeline is publicly available as a
standalone C++ package.
I. INTRODUCTION
There has been an explosive growth in the use of cameras
on robotic and human-driven vehicles. From the robotic
perspective, a camera offers a rich source of visual informa-
tion which greatly enhances robot perception in contrast to
lidar line scanners; recent advances in computing hardware
facilitate real-time image processing. From the automotive
perspective, multiple cameras are useful for driver assistance
applications which help improve road safety. Image pro-
cessing applications utilizing multiple cameras on a vehicle
require both an accurate intrinsic calibration for each camera
and an accurate extrinsic calibration. An accurate intrinsic
camera calibration consists of an optimal set of parameters
for a camera projection model that relates 2D image points
to 3D scene points; these optimal parameters correspond to
minimal reprojection error. An accurate extrinsic calibration
corresponds to accurate camera poses with respect to a
reference frame on the vehicle, usually the odometry frame.
Accurate calibration allows feature points from one camera
to be reprojected into another camera with low reprojection
errors, and furthermore, odometry data can be used in
conjunction with the camera extrinsics to efficiently compute
a good initial estimate of the camera poses which can then be
refined via local bundle adjustment with minimal correction.
We use the unified projection model described by Mei et
al. in [1] which works well in practice for regular, wide-
angle, fish-eye, and catadioptric cameras. This model differs
from the unified Taylor model proposed by Scaramuzza et
al. [2] in the aspect that Mei’s model is of a parametric form
and explicitly models a generic camera while Scaramuzza’s
model is represented by an arbitrary Taylor polynomial. As
a result, a closed-form Jacobian matrix can be formulated
using Mei’s model but not for Scaramuzza’s model. Contrary
to claims that Mei’s model applied to fish-eye cameras has
limited accuracy, we find from experimental data that Mei’s
model works very well for fish-eye cameras in practice.
Our pipeline has two stages: intrinsic and extrinsic cal-
ibration. In the intrinsic calibration stage, we require a
large chessboard on which all squares have an equal known
dimension. For each camera, with the chessboard held in a
wide variety of poses, we use an automatic chessboard corner
detector to find all interior corners on the chessboard in every
image until we accumulate a minimum number of corners.
We then use the set of estimated corner coordinates to find
the parameters to Mei’s model by generating an initial esti-
mate of the intrinsic parameters and refining the parameters
via non-linear optimization; an analytical Jacobian is used to
significantly speed up the optimization.
In the extrinsic calibration stage, we separately run monoc-
ular VO with sliding window bundle adjustment for each
camera. It is possible for the VO to break occasionally in
poorly-textured areas. Nevertheless, we use all sets of VO
estimates to find an initial estimate of the camera-odometry
transform for each camera; each set of VO estimates has
a different scale. We triangulate the inlier feature point
correspondences generated by monocular VO using the initial
camera-odometry transforms and odometry data. The result-
ing sparse map is then optimized via bundle adjustment; the
odometry poses are kept fixed while all 3D scene points and
the camera-odometry transforms are optimized. At this point,
the camera-odometry transforms become more accurate;
however, they are not sufficiently accurate for reprojection of
feature points from one camera to another camera with sub-
pixel reprojection errors. To solve this issue and still adhere
to the assumption of no overlapping fields of views, we
find feature point correspondences across different cameras.
Starting from the first odometry pose, we maintain a local
frame history for each camera, and we find feature point
correspondences between each camera’s current frame and
every frame in every other camera’s frame history. For each
frame pair, we rectify the two images on a common image
2013 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS)
November 3-7, 2013. Tokyo, Japan
978-1-4673-6357-0/13/$31.00 ©2013 IEEE 1793