Robust Real-Time SFM in a Combined Formulation of Tracking and Reconstruction Olaf K¨ ahler, Joachim Denzler Chair for Computer Vision, Friedrich Schiller University of Jena Email: {kaehler,denzler}@informatik.uni-jena.de Abstract Recently it was observed, that a combined for- mulation of tracking and reconstruction increases the robustness and accuracy of both these steps in structure-from-motion problems [9]. However, the benefits come at the cost of a higher computational complexity. In this work, we present strategies for an efficient implementation of such a combined ap- proach. We identify the time consuming steps in the system and analyze opportunities for simplifying and parallelizing the original problem. An evalua- tion of the overall system is presented and we show, that frame rates of 5 fps and beyond are achieved on current hardware, without significant losses in robustness and accuracy. 1 Introduction Gathering the scene structure from images of a sin- gle, moving camera has long been a topic of re- search in computer vision. Especially for applica- tions like autonomous robot navigation, robust and efficient online solutions are required. In this work, we present a combined approach for tracking and reconstruction [9], outperforming others in terms of both tracking robustness and reconstruction ac- curacy. As the main contribution, we show tech- niques for improving the run-time of such a system to frame rates of 5 fps and beyond, while maintain- ing the improved accuracy and robustness. Approaching feature tracking and geometric re- construction as a combined problem allows a feed- back between the two steps. The tracking results will be implicitly revised, if they are not consistent with the reconstructed 3D geometry. Additionally, an explicit model of the tracking error is avoided, in contrast to the Gaussian or non-Gaussian models needed in classical formulations of 3D reconstruc- tion [13]. This extends the known structure-from- motion approaches and allows to handle noisy and blurred, low quality images as in figure 1. The feedback falls in the line of explicit multi- view motion constraints [14], model based track- ing [3, 10] and multi-view plane fitting [5]. Like all these works, a piecewise planar scene is exploited in [9] to establish a link from the 3D geometry to the visual scene appearance. However, both scene structure and camera motion are recovered at the same time in a single, combined optimization prob- lem. This leads to a direct 3D reconstruction from image intensities [8], which is optimal in case of Gaussian noise on grey values. Solving the combined optimization problem re- quires a high computational effort. In their orig- inal work of [9], the authors present an online framework running at 0.3 fps. We show extensions to their approach, allowing frame rates beyond 5 fps on comparable hardware. These speedups are achieved mostly by simplifications similar to [4] and a parallelization of the original problem. We will first give a theoretical review of the ap- proach in section 2. The practical aspects of an efficient implementation are then addressed in sec- tion 3. A thorough evaluation is presented in sec- tion 4, especially focusing on the run-time perfor- mance and applications to robot navigation. Finally, a short summary is concluding the paper in sec- tion 5. 2 Theoretical Overview As structure-from-motion is typically divided into feature tracking and geometric reconstruction, we will first review independent solutions for these two steps exploiting planar scene structure. Based on these concepts, we then develop the combined ap- proach and theoretically analyze its benefits. VMV 2008 O. Deussen, D. Keim, D. Saupe (Editors)