3-D Live: Real Time Interaction for Mixed Reality Simon Prince 1 Adrian David Cheok 1 Farzam Farbiz 1 Todd Williamson 2 Nik Johnson 2 Mark Billinghurst 3 Hirokazu Kato 4 1 National University of Singapore, {elesp,eleadc,eleff}@nus.edu.sg 2 Zaxel Systems, {toddw,nik}@zaxel.com 3 University of Washington, grof@hitl.washington.edu 4 Hiroshima State University, kato@sys.im.hiroshima-cu.ac.jp ABSTRACT We describe a real-time 3-D augmented reality video- conferencing system. With this technology, an observer sees the real world from his viewpoint, but modified so that the image of a remote collaborator is rendered into the scene. We register the image of the collaborator with the world by estimating the 3-D transformation between the camera and a fiducial marker. We describe a novel shape- from-silhouette algorithm, which generates the appropriate view of the collaborator and the associated depth map at 30 fps. When this view is superimposed upon the real world, it gives the strong impression that the collaborator is a real part of the scene. We also demonstrate interaction in virtual environments with a “live” fully 3-D collaborator. Finally, we consider interaction between users in the real world and collaborators in a virtual space, using a “tangible” AR interface. Keywords Video-Conferencing, Augmented Reality, Image Based Rendering, Shape from Silhouette, Interaction INTRODUCTION Science fiction has presaged many of the great advances in computing and communication. In 2001: A Space Odyssey, Dr Floyd calls home using a videophone – an early on-screen appearance of 2-D video-conferencing. This technology is now commonplace. More recently, the Star Wars films depicted 3-D holographic communication. In this paper we apply computer graphics to create what may be the first real-time “holo-phone”. Existing conferencing technologies have a number of limitations. Audio-only conferencing removes visual cues vital for conversational turn-taking. This leads to increased interruptions and overlap [8], and difficulty in disambiguating between speakers and in determining willingness to interact [14]. Conventional 2-D video- conferencing improves matters, but large user movements and gestures cannot be captured [13], there are no spatial cues between participants [29] and participants cannot easily make eye contact [30]. Participants can only be viewed in front of a screen and the number of participants is limited by monitor resolution. These limitations disrupt fidelity of communication [34] and turn taking [10], and increase interruptions and overlap [11]. Collaborative virtual environments restore spatial cues common in face-to-face conversation [4], but separate the user from the real world. Moreover, non-verbal communication is hard to convey using conventional avatars, resulting in reduced presence [29]. We define the “perfect video avatar” as one where the user cannot distinguish between a real human present in the scene and a remote collaborator. Perhaps closest to this goal of perfect tele-presence is the Office of the Future work [27], the Virtual Video Avatar of Ogi et al.[25], and the work of Mulligan and Daniilidis [23][24]. All sytems use multiple cameras to construct a geometric model of the participant, and then use this model to generate the appropriate view for remote collaborators. Although impressive, these systems currently do not generate the whole 3D model – one cannot move 360 o around the virtual avatar. Moreover, since the output of these systems is mediated via projection screens the display is not portable. The goal of this paper is to present a solution to these problems, by introducing an augmented reality (AR) video-conferencing system. Augmented reality refers to the real-time insertion of computer-generated three-dimensional content into a real scene (see [2], [3] for reviews). Typically, the observer views the world through a head mounted display (HMD) with a camera attached to the front. The video is captured, modified and relayed to the observer in Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists requires prior specific permission and/or a fee. CSCW’02 November 16-20, 2002, New Orleans, Louisiana, USA. Copyright 2002 ACM 1-58113-560-2/02/0011 $5.00 Figure 1: Observers view the world via a head-mounted display (HMD) with a front-mounted camera. Our system detects markers in the scene and superimposes live video content rendered from the appropriate viewpoint in real time. Figure 1: Observers view the world via a head-mounted display (HMD) with a front-mounted camera. Our system detects markers in the scene and superimposes live video content rendered from the appropriate viewpoint in real time. 190 364