Optimal shape from motion estimation with missing and degenerate data Manuel Marques and Jo˜ ao Costeira Institute for Systems and Robotics - Instituto Superior T´ ecnico Av. Rovisco Pais - 1049-001 Lisboa PORTUGAL {manuel,jpc}@isr.ist.utl.pt Abstract Reconstructing a 3D scene from a moving camera is one of the most important issues in the ﬁeld of computer vision. In this scenario, not all points are known in all images (e.g. due to occlusion), thus generating missing data. The state of the art handles the missing points in this context by en- forcing rank constraints on the point track matrix. How- ever, quite frequently, close up views tend to capture planar surfaces producing degenerate data. If one single frame is degenerate, the whole sequence will produce high errors on the shape reconstruction, even though the observation ma- trix veriﬁes the rank 4 constraint. In this paper, we propose to solve the structure from motion problem with degener- ate data, introducing a new factorization algorithm that im- poses the full scaled orthographic model in one single op- timization procedure. By imposing all model constraints, a unique (correct) 3D shape is estimated regardless of the data degeneracies. Experiments show that remarkably good reconstructions are obtained with an approximate models such as orthography. 1. Introduction One of the most important issues in computer vision is deﬁnitely the structure from motion problem, where the ob- ject’s structure and motion are obtained from image mea- surements. Considering the orthographic camera model, this task can be solved by the Tomasi-Kanade method [12]. More complex cameras have been considered in [8, 10]. Note that the referred algorithms are adequate only if all features points are visible in each image. Problems oc- cur when some measurements are missing and this happens nearly always in real situations. The problem which we propose to solve in this paper, is the 3D Reconstruction of an object’s shape from an im- age stream with missing data. The assumed camera model This work was partially supported by the Fundacao para a Ciencia e Tecnologia (ISR/IST pluriannual funding) through the POSC Program that includes FEDER funds, and project POCTI/AUR/48123/2002. is the orthographic model. To minimize perspective effect, the image sequence should be produced by close-up views of the objects (these views quite frequently are of planar surfaces). The missing data problem can be formulated as an opti- mization given by: Problem 1 (  A,  B) ∗ = arg min  A,  B      (Z −  A  B)  D       F where Z and D are the measurements and mask matri- ces, respectively. The mask matrix is a binary matrix iden- tifying the known data with 1 and unknown data with 0. According to the orthographic camera model, matrix Z has rank 4 and this fact is used to estimate the un- known data. In [12],the missing data is sequentially re- placed using complete subsets of the data. But, this ﬁrst approach does not solve Problem 1 for generic conﬁgura- tions of D, as proved by Jacobs [7]. The proposed approach in [7] is a non-iterative and sub-optimal algorithm where the measurement matrix verify the rank constraint. In the same way of Jacobs’ approach, there are several algorithms [5, 11, 13] known as batch algorithms, because the solution (sub-optimal in presence of noise) is found in one global step. Due to this reason, this type of methods can be used to obtain an initialization to iterative algorithm where alter- nation algorithms play an important role. These last algo- rithms are based in the fact that if A or B are known, there is a closed-form solution for the other such that (Problem 1) is minimized. Guerreiro and Aguiar approach [4] is similar to Aanaes et al [1], both algorithms project the data in a subspace in each iteration. The convergence of the referred methods is initially good but it is very susceptible to ﬂatlining. Then, Buchanan [2] presented a Newton method to improve con- vergence. The constraint used by these algorithms, the rank con- straint, is not enough to obtain a correct estimate when there are images in the sequence where the 3D points of the known projections belongs to 1D or 2D subspaces. This happens because the optimization problem (1) has inﬁnite minimae. 1