Fast and Accurate Environment Modeling using Three-Dimensional Occupancy Grids Katrin Pirker, Matthias R¨ uther, Horst Bischof Institute for Computer Graphics and Vision University of Technology, Graz, Austria {kpirker,ruether,bischof}@icg.tugraz.at Gerald Schweighofer Institute of Digital Image Processing Joanneum Research GmbH, Graz, Austria gerald.schweighofer@joanneum.at Abstract Building a dense and accurate environment model out of range image data faces problems like sensor noise, ex- tensive memory consumption or computation time. We present an approach which reconstructs 3D environments using a probabilistic occupancy grid in real-time. Operat- ing on depth image pyramids speeds up computation time, whereas a weighted interpolation scheme between neigh- boring pyramid layers boosts accuracy. In our experiments we compare our method with a state-of-the-art mapping procedure. Our results demonstrate that we achieve better results. Finally, we present its viability by mapping a large indoor environment. 1. Introduction With the launch of Microsoft’s Kinect [1], dense three- dimensional environmental mapping has become increas- ingly popular. It is a basic pre-requisite in many vision applications such as human pose estimation [6], visual in- spection, augmented reality or object detection [3]. Fur- thermore, it allows us to plan high-level robotic tasks such as path planning [18], obstacle avoidance or object manip- ulation. Currently, most existing dense modeling algorithms ei- ther use a surface based or volumetric representation of the environment. Surface based approaches ﬁt geometric prim- itives such as planes, spheres or cubes to three-dimensional pointcloud data [13] or construct a triangular mesh out of it. Although such methods can be computed efﬁciently, they face severe problems: Updating an existing surface mesh with new incoming sensor data or modeling dynamic envi- ronments requires sophisticated remeshing techniques. Be- cause of that, many algorithms perform surface modeling in an ofﬂine step after a reconstruction pipeline. Algorithms operating online usually require some regularization during surface reconstruction to deal with disturbed or wrong depth measurements [11]. Besides, surface based approaches do not model free, occupied or unseen space, which is neces- sary when thinking of many robotic tasks. Using pointcloud or surface based representations, memory consumption also increases with the number of incoming sensor readings. Volumetric approaches instead represent the surrounding as an evenly spaced grid, where each grid cell (voxel) holds a random variable of being occupied [20]. Because of their probabilistic nature, occupancy grids allow for easy inte- gration of noisy sensor data. Furthermore, they can be used for any spatial reasoning task (i.e. grasping, path planning), since they also model free and unseen space. Different sen- sor modalities such as stereo and laser can be simply inte- grated into a single environment map [9]. Unfortunately, their accuracy highly depends on the degree of discretiza- tion and the underlying probabilistic model. Because of the huge amount of data provided by currently available range image devices, sparse environment representations are pre- ferred to stay real-time capable. Hence, there is a need for efﬁcient and accurate al- gorithms constructing realistic environments out of dense, noisy range image data in reasonable time. In this work we propose a fast and accurate mapping routine operating on a range image device. To account for sensor noise, our envi- ronment is represented as a three-dimensional occupancy grid. In order to gain efﬁciency and accuracy, we com- bine pyramidal depth image processing together with an interpolation scheme between neighboring pyramid layers. Furthermore, a GPU implementation allows for fast pyra- mid generation as well as parallel voxel processing during occupancy grid mapping. Although we present its appli- cation using a depth camera, the approach can be easily adapted to any other range scanner. Synthetic experiments and comparison to groundtruth data attest the accuracy of our approach. We demonstrate that the used interpolation scheme outperforms state-of-the-art algorithms compared to groundtruth. Implementing occupancy operations and pyramid generation on the GPU allows for real-time per- formance.