Occlusion Handling in Video-based Augmented Reality Using the Kinect Sensor for Indoor Registration J. Adri´ an Leal-Mel´ endrez, Leopoldo Altamirano-Robles, and Jesus A. Gonzalez {jalm,robles,jagonzalez}@ccc.inaoep.mx National Institute for Astrophysics, Optics and Electronics, Computer Science Department, Luis Enrique Erro No. 1, Tonantzintla, Puebla, Mexico http://ccc.inaoep.mx Abstract. Video-based Augmented Reality (VAR) aims to add 3D virtual objects (3D VOs) to a real world video sequence, in order to provide additional and useful information to facilitate some tasks, like computer aided surgery, simulation in a real environment, satellite positioning, interior design, among others. To achieve a consistent and convincing augmented scene, it is necessary that the VOs are properly occluded by real objects (Occlusion Problem in VAR); in this paper, we present a strategy based on the use of the Kinect sensor to solve this problem. In the occlusion stage we evaluate distances between real and VOs. Then, the parts of the VO occluded by a real object are calculated and removed. We found that the Kinect sensor is appropriate to be used for handling occlusions in indoor en- vironments, dynamic scenarios and real-time applications. Experiments showed comparable results with the state of the art in both issues: occlusion handling and processing time. Keywords: occlusion handling, video based augmented reality, hidden surface removal, kinect 1 Introduction Augmented Reality (AR) could be the answer for the growing demand of new user interfaces, in which space is not restricted to a screen and controls become unnecessary. AR adds 3D virtual objects (3D VOs) to a real scene, allowing the superposition of computer-generated graphics on real world scenes, in such a way that both look as a part of the same 3D scene [6]. In this way, a user can receive useful information in real time and in the most adequate place (real environment) and be guided in a determined task. Nowadays several applications in areas such as medicine, entertainment, education, architecture, among others, use AR; soon, even more areas will beneﬁth from it. An important task in order to create a synthetic realistic scene, is to align virtual and real objects in two ways: geometrical (spatial precision) and semantical (graphic credibility) [4]. Spatial precision requires the 3D VOs to be appropriately registered in the real world, which means that they always must be in the right position and orien- tation with respect to the world. On the other hand, graphic credibility refers to the scene realism, i.e., the illusion of both elements, virtual and real, coexisting at the same spatiotemporal place. Graphic credibility has two main branches: the photo-realism,