Robotics and Autonomous Systems 55 (2007) 597–607 www.elsevier.com/locate/robot A variational method for the recovery of dense 3D structure from motion Hicham Sekkati, Amar Mitiche ∗ Institut national de la recherche scientifique, INRS-EMT, Place Bonaventure, 800, rue de la Gauchetiere ouest, Suite 6900, Montreal, Quebec, Canada, H5A 1K6 Received 3 March 2005; received in revised form 16 November 2006; accepted 16 November 2006 Available online 19 December 2006 Abstract The purpose of this study is to investigate a variational formulation of the problem of three-dimensional (3D) interpretation of temporal image sequences based on the 3D brightness constraint and anisotropic regularization. The method allows movement of both the viewing system and objects and does not require the computation of image motion prior to 3D interpretation. Interpretation follows the minimization of a functional with two terms: a term of conformity of the 3D interpretation to the image sequence first-order spatio-temporal variations, and a term of regularization based on anisotropic diffusion to preserve the boundaries of interpretation. The Euler–Lagrange partial differential equations corresponding to the functional are solved efficiently via the half-quadratic algorithm. Results of several experiments on synthetic and real image sequences are given to demonstrate the validity of the method and its implementation. c 2007 Published by Elsevier B.V. Keywords: Image sequence analysis; Optical flow; 3D from 2D; Anisotropic regularization 1. Introduction The recovery of the shape of real objects from image motion, referred to as structure from motion perception, is a fundamental problem in computer vision. It occurs in many useful applications such as robotics, real object modeling, 2D- to-3D film conversion, augmented reality rendering of visual data, internet and medical imaging, among others. Computer vision methods which compute image motion before recovering structure are known as two-stage or indirect methods. Those which compute structure without prior image motion estimation are known as direct methods. Direct methods use an explicit model of image motion in terms of the 3D variables to be estimated. For instance, in this study we assume that environmental objects are rigid and we express image motion in terms of the parameters of rigid motion and depth. One can also make a distinction between dense and sparse recovery of structure from motion. Sparse recovery, where depth is computed at a sparse set of points of the image positional array, has been the subject of numerous well- documented studies [1–5]. Dense recovery, however, where one ∗ Corresponding author. E-mail address: mitiche@inrs-telecom.uquebec.ca (A. Mitiche). seeks to compute a depth and 3D motion over the whole im- age positional array, has been significantly less researched in spite of the many studies on dense estimation of image motion [6–8], understandably so, however, because practical applica- tions have appeared latterly. This study addresses the problem of dense recovery of structure from motion, more precisely the problem of estimating dense maps of depth and 3D motion from a temporal sequence of monocular images. One must differentiate this problem from the problem of estimating depth in stereoscopy ([9,10], for instance). Although one can argue that the two problems are conceptually similar because one can be considered a discrete version of the other, their input and the processing of this input are dissimilar. As indicated in [11], one can readily see a difference from an abstract point of view, because stereoscopy implies the geometric motion of displacement between views, and image temporal sequences the kinematic notion of motion of the viewing system and viewed objects. A displacement is defined by an initial position and a final position, intermediate positions being immaterial. Consequently, the notions of time and velocity are irrelevant. With motion, in contrast, time and velocity are fundamental dimensions. One can also readily see a difference from a more practical point of view, because both the viewing system and viewed objects can move when acquiring temporal image 0921-8890/$ - see front matter c 2007 Published by Elsevier B.V. doi:10.1016/j.robot.2006.11.006