Simultaneous Motion Segmentation and Structure from Motion Luca Zappella IIiA University of Girona zappella@eia.udg.edu Alessio Del Bue IIT Genova alessio.delbue@iit.it Xavier Llad´ o IIiA University of Girona llado@eia.udg.edu Joaquim Salvi IIiA University of Girona qsalvi@eia.udg.edu Abstract This paper presents a novel approach to simultaneously compute the motion segmentation and the 3D reconstruc- tion of a set of 2D points extracted from an image sequence. Starting from an initial segmentation, our method proposes an iterative procedure that corrects the misclassified points while reconstructing the 3D scene, which is composed of objects that move independently. This optimization pro- cedure is made by considering two well-known principles: firstly, in multi-body Structure from Motion the matrix de- scribing the 3D shape is sparse, secondly, the segmented 2D points must give a valid 3D reconstruction given the ro- tational metric constraints. Our formulation results in a bilinear optimization where sparsity and metric constraints are enforced at each iteration of the algorithm. The final result is the corrected segmentation, the 3D structure of the moving objects and an orthographic camera matrix for each motion and each frame. Results are shown on synthetic se- quences and a preliminary application on real sequences of the Hopkins155 database is presented. 1. Introduction The inference of the 3D position of moving objects in a scene is one of the most important tasks in Computer Vi- sion. In complex scenarios where several bodies rigidly move, it is first necessary to cluster the motion belonging to different objects before performing any other reconstruction task. In particular, Motion Segmentation (MS) from fea- ture trajectories consists of segmenting the trajectories that move with different motions throughout a video sequence. MS is a low-level task and it is a fundamental step for any further motion analysis. Its importance is denoted by the active research within this field since the beginning of com- puter vision to date. Different strategies have been used to tackle MS as described in [17]: image difference, statistics, wavelets, Optical Flow, Layers and Manifold clustering to cite a few. Recently the Hopkins155 database [14] has be- come a standard benchmark for the evaluation of MS tech- niques. A few algorithms [7, 9, 18] reported low misclas- sification rates on the Hopkins155 database which testifies that MS algorithms are becoming more reliable. Once a segmentation is available with image trajectories assigned to each object, other higher level tasks such as 3D reconstruction can take place. In particular, uncalibrated Structure from Motion (SfM) is often required for several applications. Given one object that moves throughout a video sequence and given its 2D tracked features the aim of SfM is to recover both the 3D coordinates of the points (up to a scale factor) and the motion description of the whole structure for each frame (up to an arbitrary initial rotation). Numerous techniques have been proposed to solve the SfM problem, one of the most successful approaches has been the Tomasi and Kanade’s factorization algorithm [12] de- veloped in the early 90’s. The key idea of their method is to express the geometric invariants present in the data as a bilinear model of its 3D shape and motion components. Tomasi and Kanade’s algorithm extracts these components globally by using the whole information contained in the trajectory matrix of the moving shape. The algorithm was later extended to work with more general camera models for rigid objects [4, 10, 13] and more recently to deal with non-rigid objects [2]. All these techniques share a common assumption: there is only one object moving in the scene. In order to perform multiple reconstruction, it is necessary to rely on a MS technique that feeds the SfM algorithm with one object at a time or to develop a different framework for multi-body SfM. 1.1. Related works on multi-body SfM The early attempts to solve multi-body SfM tried to tackle the problem using algebraic approaches [5]. After algebraic approaches, which are very sensitive to noise, re- 1