Introduction to Multiview Rank Conditions and their Applications: A Review. ∗ Jana Košecká ⋄ Yi Ma † ⋄ Department of Computer Science, George Mason University † Electrical & Computer Engineering Department, University of Illinois at Urbana-Champaign e-mail:kosecka@cs.gmu.edu, yima@uiuc.edu ABSTRACT Understanding the representations of 3D scenes as en- coded in multiple views taken by a camera from differ- ent vantage points is central to many tasks in image and video analysis. These tasks range from recovering the camera motion, 3D structure of the scene and detection and characterization of multiple motions in video. We will demonstrate that the natural representations of a 3D scene in 2D images is in terms of the incidence re- lations among different geometric primitives, which can be concisely characterized by rank conditions of multi- view matrices. The proposed rank conditions capture all existing independent multilinear constraints and en- able truly global geometric analysis of the multiple views comprised of different geometric features. In addition to the analysis, we present natural factorization based lin- ear algorithms for structure and motion recovery, image transfer and matching across multiple views applicable in both calibrated and uncalibrated setting. We will demon- strate the approach experimentally on a problem of multi- frame structure and motion recovery using point and line features and their incidence relations. 1 I NTRODUCTION Analysis, alignment and characterization of the con- tent of multiple images of a scene captured by a camera from different vantage points is central to many tasks in video and image analysis. Most of the past research on video coding, compression and multimedia applications involving video originated in image processing commu- nity and focused predominantly on 2D image processing techniques to encode the information in the image stream. On the other hand large amount of work in computer vi- sion community has been devoted to the problems of re- covery of 3D models of the environments from multiple views. The applications range from building 3D models from photographs (generally referred to as image render- ing techniques) with applications to architectural preser- vation, computer graphics or special effects in movie in- dustry, augmented reality systems, human computer in- teraction or object level modeling to retail purposes. * The work is supported by NSF grant IIS-0118732. It is inevitable that it is the 3D structure of the envi- ronment which gives rise to the video and photography content and hence should be exploited in the analysis. It is therefore central to understand and study how is the 3D structure encoded in multiple views of the scene and what is the relationship between the projections of the 3D world and camera displacements. Considering scenarios where the observed motion of the objects in the scene and/or camera is rigid, the relationships are to a large ex- tent characterized by various geometric constraints be- tween observable geometric primitives and rigid body motion. Characterization of the existing geometric constraints has a long history both in computer vision and pho- togrammetry. The basic formulations of the intrinsic ge- ometric constraints governing perspective projections of point features in two views originated in photogramme- try and were later revived the computer vision community in early eighties [1]. Natural extensions of relationships between two views is to consider multiple views and dif- ferent feature primitives. In the computer vision litera- ture, fundamental and structure independent relationships between image features and camera displacements were characterized by the so-called multilinear matching con- straints [2, 3, 4, 5]. These geometric relationships were used extensively for feature matching, point-line transfer to a new view and motion and structure recovery from three views [6, 7]. This line of work culminated recently in publication of two monographs on this topic [8, 9]. In this paper we present new characterization of the existing multiview constraints in terms of rank condi- tions of appropriate multiple view matrices introduced in [10, 11]. We start ﬁrst by introducing the rank con- ditions among multiple views of point and line features separately. We will demonstrate in an intuitive way that the rank conditions of these multiview matrices captures the relationships among all previously known multilin- ear constraints and generalizes previously studied trilin- ear constraints involving mixed point and line features to a multiview setting. As we will see the linear formula- tion of the problem will give rise to natural algorithms for geometric feature matching, feature transfer across