C HISEL: Real Time Large Scale 3D Reconstruction Onboard a Mobile Device using Spatially-Hashed Signed Distance Fields Matthew Klingensmith Carnegie Mellon Robotics Institute mklingen@andrew.cmu.edu Ivan Dryanovski The Graduate Center, City University of New York idryanovski@gc.cuny.edu Siddhartha S. Srinivasa Carnegie Mellon Robotics Institute siddh@cs.cmu.edu Jizhong Xiao The City College of New York jxiao@ccny.cuny.edu Abstract—We describe CHISEL: a system for real-time house- scale (300 square meter or more) dense 3D reconstruction onboard a Google Tango [1] mobile device by using a dynamic spatially-hashed truncated signed distance ﬁeld[2] for mapping, and visual-inertial odometry for localization. By aggressively culling parts of the scene that do not contain surfaces, we avoid needless computation and wasted memory. Even under very noisy conditions, we produce high-quality reconstructions through the use of space carving. We are able to reconstruct and render very large scenes at a resolution of 2-3 cm in real time on a mobile device without the use of GPU computing. The user is able to view and interact with the reconstruction in real-time through an intuitive interface. We provide both qualitative and quantitative results on publicly available RGB-D datasets [3], and on datasets collected in real-time from two devices. I. I NTRODUCTION Recently, mobile phone manufacturers have started adding high-quality depth and inertial sensors to mobile phones and tablets. The devices we use in this work, Google’s Tango [1] phone and tablet have very small active infrared projection depth sensors combined with high-performance IMUs and wide ﬁeld of view cameras (Section IV-A). Other devices, such as the Occiptal Inc. Structure Sensor [4] have similar capabil- ities. These devices offer an onboard, fully integrated sensing platform for 3D mapping and localization, with applications ranging from mobile robots to handheld, wireless augmented reality. Real-time 3D reconstruction is a well-known problem in computer vision and robotics [5]. The task is to extract the true 3D geometry of a real scene from a sequence of noisy sensor readings online. Solutions to this problem are useful for navigation, mapping, object scanning, and more. The problem can be broken down into two components: localization (i.e. estimating the sensor’s pose and trajectory), and mapping (i.e. reconstructing the scene geometry and texture). Consider house-scale (300 square meter) real-time 3D map- ping and localization on a Tango-like device. A user (or robot) moves around a building, scanning the scene. At house-scale, we are only concerned with features with a resolution of about 2-3 cm (walls, ﬂoors, furniture, appliances, etc.). To facilitate scanning, real-time feedback is given to the user on the device’s screen. The user can export the resulting 3D scan without losing any data. Fig.1 shows an example of this use (a) CHISEL creating a map of an entire ofﬁce building ﬂoor on a mobile device in real-time. (b) Reconstructed apartment scene at a voxel resolution of 2cm. Fig. 1: CHISEL running on Google’s Tango [1] device. case (Section III) in progress. House-scale mapping requires that the 3D reconstruction algorithm run entirely onboard; and fast enough to allow real- time interaction. Importantly, the entire dense 3D reconstruc- tion must ﬁt inside the device’s limited (2-4GB) memory (Section IV-A). Because some mobile devices lack sufﬁciently powerful discrete graphics processing units (GPU), we choose not to rely on general purpose GPU computing to make the problem tractable in either creating or rendering the 3D reconstruction. 3D mapping algorithms involving occupancy grids [6], keypoint mapping [7] or point clouds [8–10] already exist for mobile phones at small scale. But most existing approaches either require ofﬂine post-processing or cloud computing to create high-quality 3D reconstructions at the scale we are interested in. Many state-of-the-art real-time 3D reconstruction algo-