Augmented Reality through Real-time Tracking of Video Sequences Using a Panoramic View Christophe Dehais Matthijs Douze eraldine Morin Vincent Charvillat IRIT/ENSEEIHT, 2 rue Camichel, 31071 Toulouse cedex 7, France {dehais, douze, morin, charvi}@enseeiht.fr Abstract We propose a 2D approach for Augmented Reality (AR) applications where the real scene is modelled as a static panorama. We adapted a sparse tracking method based on homographies to track the orientation and zooming pa- rameters of the camera during a video sequence. AR sce- narii (synthetic object insertion, real object or character ex- traction) can be performed in arbitrary static environments (from wide outdoor scenes to virtually augmented desktops or conference rooms). Introduction AR applications need to register virtual augmenta- tions with respect to the real scene in order to melt them seamlessly. We tackle this problem by register- ing into a panorama each frame of a video sequence (Figure 1(a)). This gives us a mapping from which orienta- tion and zooming parameters of the camera are recovered. This information can then be used to setup virtual augmen- tations, for example generated from a CAD model rendered in real-time or from a compatible panorama rendered of- fline. This framework also allows extracting Video Object Planes (VOP) and integrating them in a virtual environ- nement. Possible uses of this method include for example, a con- ference room whose background is known, but where the attendees move or CCTV systems where louts have specific types of motion. In a first section, we review the different approaches used to design augmented reality applications. Then, sec- tion 2 presents the context and the different stages of our ap- proach. Section 3 details the tracking method we used. Fi- nally, in section 4, experimental results demonstrate the use of our tracking method in different augmented reality sce- narii. 1. Related work The registration of augmentation objects with respect to a real scene can include geometrical information: this is the case for pose estimation. Depending on how much infor- mation is known about the scene, different solutions can be considered. Many approaches try to recover 3D information about the scene using vision-based techniques or physical sen- sors [1]. Among vision-based methods, some try to track the pose of an object (often polyhedral), whose wireframe model is known [8, 4]. Methods inspired by image-based rendering aim at us- ing as little 3D information as possible. In [5], Hung et al. propose to augment a panorama by compositing it with video objects. A virtual 3D referential has to be localised on the panorama. Augmenting objects are generated by “view- morphing” (synthesized from images taken at different an- gles). Our approach requires less user interaction: it only needs the registration of the initial image. 2. System overview Our system works in two stages. The first one is per- formed offline: we acquire a panoramic view of an arbi- trary static environment. This reference view can be cap- tured by a dedicated device like an omnidirectional cam- era or be composed by mozaicing range views of the scene (left part of Figure 1(a)). We can then alter this panorama to create a new augmented panorama. In Figure 2(a), our refer- ence panorama is augmented with a virtual desktop (this can be done easily through image editing or more rigorously by using 3D landmarks). We may also render a compatible vir- tual background (Figure 2(d)). During the second stage we shoot the scene with a classi- cal video camera (right part of Figure 1(a)). The optical cen- 0-7695-2128-2/04 $20.00 (C) 2004 IEEE