Creating Compact Architectural Models by Geo-registering Image Collections Radek Grzeszczuk Nokia Research Center radek.grzeszczuk@nokia.com Jana Koˇ secka George Mason University kosecka@cs.gmu.edu Ramakrishna Vedantham Nokia Research Center ramakrishna.vedantham@nokia.com Harlan Hile University of Washington harlan@cs.washington.edu Abstract We present a method for automatically constructing compact, photo-realistic architectural 3D models. This method uses simple 3D building outlines obtained from ex- isting GIS databases to bootstrap reconstruction and works with both structured and unstructured image datasets. We propose an optimal view-selection algorithm for selecting a small set of views for texture mapping that best describe the structure, while minimizing warping and stitching arti- facts, and producing a consistent visual representation. The proposed method is fully automatic and can process large structured datasets in close to real-time, making it suitable for large scale urban modeling and 3D map construction. 1. Introduction 3D modeling of urban areas is entering a phase of rapid development driven by cheaper sensors, better algorithms and enhanced computational capabilities. This enables new navigation systems with more immersive, easier-to-follow instructions. The scale of modeling required by such ap- plications necessitates algorithms that are fast. The profu- sion of mobile networked devices with restricted computing power and bandwidth, such as cell phones and PDAs, also suggests that these models should be compact and efficient. In this work we present a novel, fully-automated pipeline for constructing compact 3D models. The method works both with structured image datasets (characterized by a large number of sequential views) and with unstructured im- age datasets (such as those found on popular photo sharing sites). For the structured image datasets, which typically come from surveying vehicles equipped with precise sen- sor for accurately registering camera poses, our modeling pipeline can produce high-quality 3D models very fast— processing hundreds of images per second. A key component of our approach is to leverage build- ing outlines (footprint and height) obtained from existing Geographic Information System (GIS) databases to register unstructured photo collections with the building, perform view selection, and to synthesize a 3D model. In our fully automatic 3D reconstruction pipeline, we developed a ro- bust geo-registration technique for the unstructured image collections. We propose a novel, optimal view-selection al- gorithm, which from a large number of views can select a small subset that best describes the structure, while mini- mizing warping and stitching artifacts. From the selected views, we automatically synthesize a compact 3D model of high quality that we can easily re-target to devices with a broad range of computational and bandwidth capabilities. The dominant existing approach to modeling uses highly structured data collected by cameras mounted on a driving vehicle equipped with inertial GPS sensors [15] and aug- mented with 3D LiDaR data [9, 25]. The resulting models tend to have visual artifacts (holes, local deformations) due to difficulties during the dense reconstruction stage. Cor- nelis et al. [4] use a highly-optimized 3D reconstruction pipeline that can run close to real-time by assuming that the building facades can be approximated with surfaces ruled in the vertical direction. They extend the reconstruction framework by integrating it with a car detection and local- ization module that instantiates virtual placeholder models to cover parts of the facades that have missing data. Alternative approaches use unstructured photo collec- tions [19]. Instead of creating 3D models, they focus on novel ways to explore and navigate the image collec- tions [18]. Several approaches use human assistance in the modeling stage. A pioneering example is the work of De- bevec et al. [5], where a human instantiates a simple ge- ometric primitive and manually selects a small number of carefully calibrated views to texture map the model. Werner and Zisserman [23] propose an automated technique that achieves similar results. Dick et al. [6] utilize prior knowl- edge about architecture to reconstruct the geometry. More recent examples include [17, 24]. The approach we present here uses building extents and automated geo-registration and is applicable to urban ar- eas for both structured and unstructured image collections. The buildings in GIS databases are typically represented as sets of extruded and stacked polygons each with associated height. Our approach to create visually pleasing 3D models for these types of urban structures complements the existing