High Resolution 3D Content Creation using Unconstrained and Uncalibrated Cameras Hoang Minh Nguyen, Burkhard C. W¨ unsche, Patrice Delmas, Christof Lutteroth, Wannes van der Mark Department of Computer Science, University of Auckland, New Zealand hngu039@aucklanduni.ac.nz, burkhard@cs.auckland.ac.nz, p.delmas@cs.auckland.ac.nz, lutteroth@cs.auckland.ac.nz, w.vandermark@auckland.ac.nz Abstract— An increasing number of applications require 3D content. However, its creation from real-world data either ne- cessitates expensive equipment, artistic skills, or is constrained, for example, by the range of the utilized sensors. Image-based modeling is rapidly increasing in popularity since cameras are very affordable, widely available, and have a wide image acquisition range suitable for objects of vastly different size. The technique is especially suitable for mobile robotics involving low cost equipment and robots with a light payload, for example, small UAVs. In this paper we describe a novel image-based modeling system, which produces high-quality 3D content au- tomatically from a collection of unconstrained and uncalibrated 2D images. The system estimates camera parameters and a 3D scene geometry using Structure-from-Motion (SfM) and Bundle Adjustment techniques. The point cloud density of 3D scene components is enhanced by exploiting silhouette information of the scene. This hybrid approach dramatically improves the reconstruction of objects with few visual features, for example, unicolored objects, and improves surface smoothness. A high quality texture is created by parameterizing the reconstructed objects using a segmentation and charting approach which also works for objects which are not homeomorphic to a sphere. The resulting parameter space contains one chart for each surface segment. A texture map is created by back projecting the best ﬁtting input images onto each surface segment, and smoothly fusing them together over the corresponding chart by using graph-cut techniques. I. I NTRODUCTION A key task in mobile robotics is the exploration and mapping of an unknown environment using the robot’s sensors. SLAM algorithms can create a map in real time using different sensors. While the resulting map is suitable for navigation, it usually does not contain a high quality reconstruction of the surrounding 3D scene, e.g., for use in virtual environments, simulations, and urban design. High quality reconstructions can be achieved using image input and conventional modeling systems such as Maya, Lightwave, 3D Max or Blender. However, the process is time-consuming, requires artistic skills, and involves con- siderable training and experience in order to master the modeling software. The introduction of specialized hardware has simpliﬁed the creation of models from real physical objects. Laser scanners can create highly accurate 3D models, but are expensive and have a limited range and resolution. RGB- D sensors, such as the Kinect, have been successfully used for creating large scale reconstructions. In 2011 the Kinect- Fusion algorithm was presented, which uses the Kinect depth data to reconstruct a 3D scene using the Kinect sensor like a handheld laser scanner [28]. Since then a wide variety of new applications have been proposed such as complete 3D mappings of environments [16]. The Kinect is very affordable, but has a very limited operating range (0.8 - 3.5 m), a limited resolution and ﬁeld-of-view, and it is sensitive to environmental conditions [30]. Reconstruction 3D scenes from optical sensor data has considerable advantages such as the low price of sensors, the ability to capture objects of vastly different size, and the ability to capture highly detailed color and texture informa- tion. Furthermore optical sensors are very light weight and have a low energy consumption, which makes them ideal for mobile robots, such as small Unmanned Aerial Vehicles (UAVs). This paper proposes a novel system that employs a hybrid multi-view image-based modeling approach coupled with a surface parameterization technique as well as surface and texture reconstruction for automatically creating a high quality reconstruction of 3D objects using uncalibrated and unconstrained images acquired using consumer-level cam- eras. This makes the technique particularly suitable for recognizance and surveillance, e.g., producing 3D recon- structions with high resolution texture maps from video data of miniature UAVs. One key challenge is that reconstructing 3D scenes from a sequence of images requires knowing where each photo was taken and into what direction the camera was pointing (extrinsic parameters), as well as the internal camera set- tings, such as zoom and focus (intrinsic parameters), which inﬂuence how incoming light is projected onto the retinal plane. Our algorithm automatically estimates the intrinsic and extrinsic parameters of the camera being used and computes the 3D coordinates of a sparse set of points in the scene. This is accomplished using Structure-from-Motion and Bundle Adjustment techniques. In order to deal with feature-poor objects, additional 3D points are extracted and added by exploiting the silhouette information of the object. The beneﬁt of integrating shape- from-silhouette and shape-from-correspondence approaches is that the new hybrid system is capable of handling both featureless objects and objects with concave regions. These classes of objects often pose great difﬁculty for algorithms