Automatic Registration of Mobile LiDAR and Spherical Panoramas Ruisheng Wang NOKIA Chicago, IL, USA ruisheng.wang@nokia.com Frank P. Ferrie Centre for Intelligent Machines McGill University, Canada ferrie@cim.mcgill.ca Jane Macfarlane NOKIA Chicago, IL, USA jane.macfarlane@nokia.com Abstract We present an automatic mutual information (MI) regis- tration method for mobile LiDAR and panoramas collected from a driving vehicle. The suitability of MI for registra- tion of aerial LiDAR and aerial oblique images has been demonstrated in [17], under an assumption that minimiza- tion of joint entropy (JE) is a sufficient approximation of maximization of MI. In this paper, we show that this as- sumption is invalid for the ground-level data. The entropy of a LiDAR image can not be regarded as approximately con- stant for small perturbations. Instead of minimizing the JE, we directly maximize MI to estimate corrections of camera poses. Our method automatically registers mobile LiDAR with spherical panoramas over an approximate 4 kilome- ter drive, and is the first example we are aware of that tests mutual information registration in large-scale context. 1. Introduction Image-to-range registration is prerequisite for many ap- plications. The registration result is critical not only for texture-mapping 3D models of large-scale scenes, but also for applications such as image based upsampling of range data [6, 8, 21, 24], image-guided range segmentation [4, 2], and 3D scene modeling [5]. The problem of image-to-range registration involves the alignment of 2D images with 2D projections of 3D range data, consisting of estimating the relative camera pose with respect to the range sensors. There has been a considerable amount of research in reg- istering images with range data. Existing methods range from keypoint-based matching [3, 7, 11], structural features based matching [13, 14, 20, 23], to Mutual Information based registration [17]. The range data include terrestrial or aerial LiDAR, and the images include vertical or oblique aerial images, and ground-level images. Keypoint based matching [3, 11] is based on the similar- ity between laser intensity images and corresponding cam- era images. First, each pixel of the laser intensity image is encoded with its corresponding 3D coordinate. Then feature points are extracted by using either SIFT [16] or orstner operators [10] from both images. A robust match- ing strategy based on RANSAC [9] and/or epipolar geome- try constraint is employed to determine the correspondence pairs for computing the fundamental matrix. Sensor regis- tration is then achieved based on a robust camera spatial re- section. Ding et al. [7] registered oblique aerial images with a 3D model generated from aerial LiDAR data based on 2D and 3D corner features in the 2D images and 3D LiDAR model. The correspondence between extracted corners was based on a Hough transform and generalized M-estimator sample consensus. The resultant corner matches are used in Lowe’s algorithm [15] to refine camera parameters esti- mated from a combination of vanishing point computation and GPS/IMU readings. In general, the feature point extrac- tion and robust matching are the key to a successful regis- tration for this type of approaches. Instead of matching points, structural feature based methods [13, 14, 20, 23] match structural features in both 2D and 3D space to estimate the relative camera pose. Di- rect matching single line features is error-prone because of the noise in both LiDAR and image data as well as the ro- bustness of the detection algorithms. High-level structural features are helpful to increase the robustness of both de- tection and matching. Wang and Neumann [23] registered aerial images with aerial LiDAR based on matching so- called “3 Connected Segments” in which each linear fea- ture contains 3 segments connected into a chain. They used a two-level RANSAC algorithm to refine the puta- tive feature matches, and estimated camera pose using the method described in [12]. Liu et al. [13, 14, 20] extracted so called “rectangular parallelepiped” features, which are composed of vertical or horizontal 3D rectangular paral- lelepipeds in the LiDAR and 2D rectangles in the images, to estimate camera translation with a hypothesis-and-test scheme. Camera rotation was estimated based on at least two vanishing points. Since vanishing points are required, their methods work well for ground-level data but are not efficient to handle aerial data with a weak perspective ef- fect.