What Is Computed by Structure from Motion Algorithms? Cornelia Fermiiller and Yiannis Aloimonos Computer Vision Laboratory, Center for Automation Research, University of Maryland, College Park, MD 20742-3275, USA {fer,yiannis}@cfar.umd.edu Abstract. In the literature we find two classes of algorithms which, on the basis of two views of a scene, recover the rigid transformation be- tween the views and subsequently the structure of the scene. The first class contains techniques which require knowledge of the correspondence or the motion field between the images and are based on the epipolar constraint. The second class contains so-called direct algorithms which require knowledge about the value of the flow in one direction only and are based on the positive depth constraint. Algorithms in the first class achieve the solution by minimizing a function representing deviation from the epipolar constraint while direct algorithms find the 3D motion that, when used to estimate depth, produces a minimum number of negative depth values. This paper presents a stability analysis of both classes of algorithms. The formulation is such that it allows comparison of the ro- bustness of algorithms in the two classes as well as within each class. Specifically, a general statistical model is employed to express the func- tions which measure the deviation from the epipolar constraint and the number of negative depth values, and these functions are studied with regard to their topographic structure, specifically as regards the errors in the 3D motion parameters at the places representing the minima of the functions. The analysis shows that for algorithms in both classes which estimate aJ1 motion parameters simultaneously, the obtained solution has an error such that the projections of the translational and rotational er- rors on the image plane are perpendicular to each other. Furthermore, the estimated projection of the translation on the image lies on a line through the origin and the projection of the real translation. 1 Introduction Structure from motion (SFM), one of the central problems in computational vision, amounts to the following problem: Given multiple views of a scene, to recover the rigid transformation between any two views and the structure of the imaged scene. Existing publications treat the problem using two broad classes of approaches, differential (where the views are close together) and discrete (where the views are far apart). Although the formalisms used in these two approaches: are not identical, the underlying geometric structure of the problem remains essentially the same. Here the analysis is done in the framework of