IEEE TRANSACTIONS ON ROBOTICS 1 Efficient homography-based tracking and 3D reconstruction for single viewpoint sensors Christopher Mei, Ezio Malis, Member, IEEE, and Patrick Rives, Member, IEEE Abstract— This paper addresses the problem of visual tracking for single viewpoint sensors and in particular how to generalise tracking to omnidirectional cameras. We analyse efficient minimi- sation approaches for the intensity based cost function (sum-of- squared differences). The inverse compositional represents state- of-the-art tracking but this approach is not well adapted to handle occlusions or changes in illumination and cannot be applied to tracking 3 dimensional objects with motion estimation. In this article, we study an alternative dubbed efficient second-order minimisation (ESM) that provides second-order convergence for the cost of a first-order approach. We show how this algorithm can be applied to 3D tracking and provide variants with better computational complexities. These results have applications in motion estimation, structure from motion, visual servoing and SLAM. The tracking algorithm was validated using an omnidi- rectional sensor mounted on a mobile robot. Index Terms— visual tracking, structure from motion, omni- directional vision I. I NTRODUCTION W IDE field of view cameras are becoming increasingly popular in mobile robotics as they offer advantages in tasks such as motion estimation, autonomous navigation and localisation [1]. Recently research with omnidirectional cameras has been focused on ego-motion estimation [2], [3] and visual servoing [4], [5]. Visual tracking, which is a fundamental step for various computer vision and robotic applications, has seen very few articles. The tracking approach presented in this work minimizes a dissimilarity measure and more specifically a sum-of-squared- differences (SSD) between a reference template and the current image taken with a large field of view sensor. This leads to a non-linear optimization problem that can be solved for small displacements (the type of movement that would be expected in a scene at video rate). The advantage of SSD tracking is precision (all the information is being used leading to sub- pixel accuracy) and speed (∼ 100Hz). This is why these techniques are particularly well adapted to robotic tasks such as visual servoing and as an input to simultaneous localisation and mapping (SLAM) algorithms. The downside is the need for a strong overlap between the reprojected and the real object for the system to converge. The apparent difficulty of tracking with omnidirectional sensors comes from the non-linear projection model resulting in changes of shape in the image that makes the direct use of methods such as KLT [6], [7] nearly impossible. Parametric models [8], [9], [10] such as the homography-based approach presented in this article are well adapted to this problem. Previous related work using homography-based tracking for perspective cameras include [11], [12] which extend the work proposed by Hager [8]. Homographies have also been used for visual servoing with central catadioptric cameras [5], [13] and share with our approach the notion of homographies for points belonging to the sphere of the unified projection model. The single viewpoint property means it would be possible to track in an unwarped perspective view. This is however undesirable for the following reasons: 1) it introduces a discontinuity in the Jacobian (at least two planes are needed to represent the 360 deg field of view), 2) the non-uniform resolution is not taken into account and 3) the approach is inefficient (in terms of speed and mem- ory usage). To our knowledge, this is the only work on SSD track- ing for omnidirectional sensors. The closest work is that of Barreto et al [4]. The authors propose a method for tracking omnidirectional lines using a contour-to-point tracker to avoid the problem of quadric-based catadioptric line fitting. For 3D plane-based tracking, closely related work is that of Cobzas and Sturm [14] for perspective cameras where the authors assume the plane positions have been pre-calculated. Compared to methods such as [15], [16] the proposed 3D tracking algorithm assumes the same motion for all the planes and iterates until convergence. This adds extra robustness as we will see in Section IV-B. The article will be organised in the following way. We will start by introducing the concept of spherical perspective projection which is adapted to large field of view sensors. We will then detail the geometric transformation considered in this work: planar homographies. Homographies have the advantage of encompassing all perspective planar deformations and thus enable to track over long sequences and avoid drift. Photometric deformations will not be considered for clarity. Affine photometric models can be introduced without changing the underlying results using for example [8]. More advanced models, taking into account for example specularities, are still object of research and not the topic of this article. Section III will discuss the problem of minimisation, how to obtain second-order convergence and apply it to single and multi plane tracking with single viewpoint sensors. Simulated and real data experiments will validate the proposed algorithms and confirm the advantages compared to standard algorithms such as the inverse compositional. The algorithm was evaluated on the motion estimation of a mobile robot and the results are compared to the precise odometry considered as ground truth.