IEEE TRANSACTIONS ON ROBOTICS 1 Efﬁcient homography-based tracking and 3D reconstruction for single viewpoint sensors Christopher Mei, Ezio Malis, Member, IEEE, and Patrick Rives, Member, IEEE Abstract— This paper addresses the problem of visual tracking for single viewpoint sensors and in particular how to generalise tracking to omnidirectional cameras. We analyse efﬁcient minimi- sation approaches for the intensity based cost function (sum-of- squared differences). The inverse compositional represents state- of-the-art tracking but this approach is not well adapted to handle occlusions or changes in illumination and cannot be applied to tracking 3 dimensional objects with motion estimation. In this article, we study an alternative dubbed efﬁcient second-order minimisation (ESM) that provides second-order convergence for the cost of a ﬁrst-order approach. We show how this algorithm can be applied to 3D tracking and provide variants with better computational complexities. These results have applications in motion estimation, structure from motion, visual servoing and SLAM. The tracking algorithm was validated using an omnidi- rectional sensor mounted on a mobile robot. Index Terms— visual tracking, structure from motion, omni- directional vision I. I NTRODUCTION W IDE ﬁeld of view cameras are becoming increasingly popular in mobile robotics as they offer advantages in tasks such as motion estimation, autonomous navigation and localisation [1]. Recently research with omnidirectional cameras has been focused on ego-motion estimation [2], [3] and visual servoing [4], [5]. Visual tracking, which is a fundamental step for various computer vision and robotic applications, has seen very few articles. The tracking approach presented in this work minimizes a dissimilarity measure and more speciﬁcally a sum-of-squared- differences (SSD) between a reference template and the current image taken with a large ﬁeld of view sensor. This leads to a non-linear optimization problem that can be solved for small displacements (the type of movement that would be expected in a scene at video rate). The advantage of SSD tracking is precision (all the information is being used leading to sub- pixel accuracy) and speed (∼ 100Hz). This is why these techniques are particularly well adapted to robotic tasks such as visual servoing and as an input to simultaneous localisation and mapping (SLAM) algorithms. The downside is the need for a strong overlap between the reprojected and the real object for the system to converge. The apparent difﬁculty of tracking with omnidirectional sensors comes from the non-linear projection model resulting in changes of shape in the image that makes the direct use of methods such as KLT [6], [7] nearly impossible. Parametric models [8], [9], [10] such as the homography-based approach presented in this article are well adapted to this problem. Previous related work using homography-based tracking for perspective cameras include [11], [12] which extend the work proposed by Hager [8]. Homographies have also been used for visual servoing with central catadioptric cameras [5], [13] and share with our approach the notion of homographies for points belonging to the sphere of the uniﬁed projection model. The single viewpoint property means it would be possible to track in an unwarped perspective view. This is however undesirable for the following reasons: 1) it introduces a discontinuity in the Jacobian (at least two planes are needed to represent the 360 deg ﬁeld of view), 2) the non-uniform resolution is not taken into account and 3) the approach is inefﬁcient (in terms of speed and mem- ory usage). To our knowledge, this is the only work on SSD track- ing for omnidirectional sensors. The closest work is that of Barreto et al [4]. The authors propose a method for tracking omnidirectional lines using a contour-to-point tracker to avoid the problem of quadric-based catadioptric line ﬁtting. For 3D plane-based tracking, closely related work is that of Cobzas and Sturm [14] for perspective cameras where the authors assume the plane positions have been pre-calculated. Compared to methods such as [15], [16] the proposed 3D tracking algorithm assumes the same motion for all the planes and iterates until convergence. This adds extra robustness as we will see in Section IV-B. The article will be organised in the following way. We will start by introducing the concept of spherical perspective projection which is adapted to large ﬁeld of view sensors. We will then detail the geometric transformation considered in this work: planar homographies. Homographies have the advantage of encompassing all perspective planar deformations and thus enable to track over long sequences and avoid drift. Photometric deformations will not be considered for clarity. Afﬁne photometric models can be introduced without changing the underlying results using for example [8]. More advanced models, taking into account for example specularities, are still object of research and not the topic of this article. Section III will discuss the problem of minimisation, how to obtain second-order convergence and apply it to single and multi plane tracking with single viewpoint sensors. Simulated and real data experiments will validate the proposed algorithms and conﬁrm the advantages compared to standard algorithms such as the inverse compositional. The algorithm was evaluated on the motion estimation of a mobile robot and the results are compared to the precise odometry considered as ground truth.