A Linear Model for Simultaneous Estimation of 3D Motion and Depth Hanno Scharr 1,2,3 and Ralf K¨ usters 1,2 1 Institute for Chemistry and Dynamics of the Geosphere, Institute III: Phytosphere Forschungszentrum J¨ ulich GmbH, 52425 J¨ ulich, Germany 2 Interdisciplinary Center for Scientiﬁc Computing, Ruprecht Karls University, Im Neuenheimer Feld 368, 69120 Heidelberg, Germany 3 Intel Research, 2200 Mission College Blvd, Santa Clara, CA 95054, USA {Hanno.Scharr,Ralf.Kuesters}@iwr.uni-heidelberg.de IEEE Workshop on Motion and Video Computing, 2002, Orlando, Florida, USA ABSTRACT A novel model for simultaneous estimation of full 3D motion and 3D position in world coordinates from multi camera sequences is presented. To this end scene ﬂow [29] and disparity are estimated in a single estimation step. The key idea is to interpret sequences of 2 or more cameras as one 4D data set. In this 4D space a given motion model, a camera model and a brightness change model are combined into a brightness change constraint equation (BCCE) combining changes due to object motion and diﬀerent camera positions. The es- timation of parameters in this constraint equation is demonstrated using a weighted total least squares es- timator called structure tensor. An evaluation of sys- tematic errors and noise stability for a 5 camera setup is shown as well as results on synthetic sequences with ground truth and data acquired under controlled labo- ratory conditions. KEY WORDS optical ﬂow, scene ﬂow, multiple cameras, disparity, depth, 3D motion 1 Introduction Motion estimation as well as disparity estimation are standard tasks for optical ﬂow algorithms as well known from early publications (e.g. [16, 21] and many more). This paper aims to combine both scene ﬂow [29] (i.e. 3D optical ﬂow estimated from 2D sequences) and disparity estimation within a single optical-ﬂow- like estimation step. As the developed model has the same form as usual brightness change constraint equa- tions (BCCE) no special estimation framework has to be established. We use the so called structure tensor method [3, 15, 17] but other methods can be applied as well (e.g. the ones in [1, 14]). In state of the art optical ﬂow algorithms for motion estimation an image sequence of a single ﬁxed camera is interpreted as data in a 3D x-y-t-space. In this space a BCCE deﬁnes a linear model for the changes of gray values due to local object motion and other parame- ters of e.g. illumination changes or physical processes (compare e.g. [13]). The result of the calculation then is a displacement vector ﬁeld, i.e. local object motion and quantities of brightness change if an appropriate model is used. Using a moving camera looking at a ﬁxed scene identical algorithms can be used to deter- mine object depth, known as structure from camera motion (e.g. [20]). The basic idea for the new estimation technique pre- sented here is to interpret the camera position s as a new dimension (see Fig. 1). Hence all image sequences acquired by a multi camera setup (or a pseudo multi camera setup as in our target application mentioned below) are combined to sample a 4D-Volume in x-y-s- t-space. If a 2D camera grid is used (as e.g. in [22] ) we get a 5D-Volume in x-y-s x -s y -t-space. This paper is restricted to the 4D case. In [5] a 2D-manifold is constructed combining all 1D trajectories of a surface point acquired by multiple cameras. This comes very close to the idea presented here. The advantage of our approach is that it is an extension of usual optical ﬂow and consequently can easily be combined with other methods and extensions stated in the related work section. In our target application plant growth shall be stud- ied using a single camera on a linear moving stage. For convenience, let the camera translate along its x-axis and denote its position by s (see Fig. 1). From other plant growth studies (e.g. [24]) we know that plant 1