3D SSD tracking with estimated 3D planes Dana Cobzas a, * , Martin Jagersand a , Peter Sturm b a Computing Science, University of Alberta, Edmonton, Canada T6G2E8 b INRIA Rho ˆne-Alpes, Montbonnot 38334, France Received 14 November 2005; received in revised form 24 August 2006; accepted 20 October 2006 Abstract We present a tracking method where full camera position and orientation is tracked from intensity differences in a video sequence. The camera pose is calculated based on 3D planes, and hence does not depend on point correspondences. The plane based formulation also allows additional constraints to be naturally added, e.g., perpendicularity between walls, floor and ceiling surfaces, co-planarity of wall surfaces etc. A particular feature of our method is that the full 3D pose change is directly computed from temporal image differences without making a commitment to a particular intermediate (e.g., 2D feature) representation. We experimentally compared our method with regular 2D SSD tracking and found it more robust and stable. This is due to 3D consistency being enforced even in the low level registration of image regions. This yields better results than first computing (and hence committing to) 2D image features and then from these compute 3D pose. Ó 2007 Elsevier B.V. All rights reserved. Keywords: Visual tracking; SSD tracking; Image registration; Plane tracking; 3D model estimation 1. Introduction In visual tracking the pose of an object or the camera motion is estimated over time based on image motion information. Some applications such as video surveillance only require that the target object is tracked in 2D image space. For other applications such as augmented reality and robotics full 3D camera motion is needed. In this paper, we concentrate on tracking full 3D pose. One way to classify tracking methods is into feature- based and registration-based. In feature-based approaches features from a (usually a priori) 3D model are matched with features in the current image. Commonly a feature detector is used to detect either special markers or natural image features. Pose estimation techniques can then be used to compute the camera position from the 2D–3D correspondences. Many approaches use image contours (edges or curves) that are matched with an a priori CAD model of the object [14,17,7]. Most systems compute pose parameters by linearizing with respect to object motion. A characteristic of these algorithms is that the feature detection is relatively decoupled from the pose computa- tion, but sometimes past pose is used to limit search ranges, and the global model can be used to exclude feature mis- matches [14,1]. In registration-based tracking the pose computation is based on directly aligning a reference intensity patch with the current image to match each pixel intensity as closely as possible. These methods assume that the change in loca- tion and appearance of the target in consecutive frames is small. Image constancy can be exploited to derive efficient gradient based schemes using normalized correlation, or a sum-of-squared differences (e.g., L 2 norm) criterion, giving the technique its popular name SSD tracking. Unlike the feature-based approaches, which build the definition of what is to be tracked into the low level routine (e.g., a line feature tracker tracks just lines), in registration-based tracking any distinct pattern of intensity variation can be 0262-8856/$ - see front matter Ó 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.imavis.2006.10.008 * Corresponding author. Tel.: +1 780 492 2564. E-mail addresses: dana@cs.ualberta.ca (D. Cobzas), jag@cs.ualberta.ca (M. Jagersand), Peter.Sturm@inrialpes.fr (P. Sturm). www.elsevier.com/locate/imavis Available online at www.sciencedirect.com Image and Vision Computing 27 (2009) 69–79