The Impact of Avatar Realism and Eye Gaze Control on Perceived Quality of Communication in a Shared Immersive Virtual Environment Maia Garau, Mel Slater, Vinoba Vinayagamoorthy, Andrea Brogni, Anthony Steed, M. Angela Sasse Department of Computer Science, University College London (UCL), Gower St., London WC1E 6BT {m.garau, m.slater, v.vinayagamoorthy, a.brogni, a.steed, a.sasse}@cs.ucl.ac.uk ABSTRACT This paper presents an experiment designed to investigate the impact of visual and behavioral realism in avatars on perceived quality of communication in an immersive virtual environment. Participants were paired by gender and were randomly assigned to a CAVE‘-like system or a head-mounted display. Both were represented by a humanoid avatar in the shared 3D environment. The visual appearance of the avatars was either basic and genderless (like a "match-stick" figure), or more photorealistic and gender-specific. Similarly, eye gaze behavior was either random or inferred from voice, to reflect different levels of behavioral realism. Our comparative analysis of 48 post-experiment questionnaires confirms earlier findings from non- immersive studies using semi-photorealistic avatars, where inferred gaze significantly outperformed random gaze. However responses to the lower-realism avatar are adversely affected by inferred gaze, revealing a significant interaction effect between appearance and behavior. We discuss the importance of aligning visual and behavioral realism for increased avatar effectiveness. Keywords Virtual Reality, immersive virtual environments, avatars, mediated communication, photo-realism, behavioral realism, social presence, copresence, eye gaze. INTRODUCTION This paper presents an experiment that investigates participants' subjective responses to dyadic social interaction in a shared, immersive virtual environment (IVE). It focuses on the impact of avatar realism on perceived quality of communication. Specifically, it explores the relative impact of two logically distinct aspects of avatar realism: appearance and behavior. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CHI 2003, April 5–10, 2003, Ft. Lauderdale, Florida, USA. Copyright 2003 ACM 1-58113-630-7/03/0004…$5.00. One of the chief appeals of IVEs as a medium of communication is that they enable remotely located people to meet and interact in a shared 3D space. This is of particular benefit for tasks such as remote acting rehearsal [19], where preserving spatial relationships among participants is paramount. However, one significant limitation is low avatar expressiveness as compared with the rich feedback available through live human faces on video. Improving avatar expressiveness poses complex challenges. There are technical limitations as well as theoretical goals to consider. Technically, one of the central constraints is the tension between "realism and real time" [20]. In terms of an avatar's appearance, increased photo-realism comes at the expense of computational complexity, introducing significant and unwanted delays to real-time communication. In terms of behavior, if the goal is to replicate each person's real movement, tracking can seem an attractive solution. Systems such as Eyematic [10] have shown compellingly that it is possible to track eye movement and drive an avatar in real time using a simple desktop camera. However, in immersive CAVE‘-like systems 1 where users wear stereoscopic goggles and move freely about the space, it can be difficult to provide a robust solution. At the same time, tracking other body and facial behaviors can be invasive, as well as expensive in terms of rendering. Research on nonverbal behavior in face-to-face communication [1] can offer valuable leads on how to improve avatar expressiveness without resorting to full tracking. In the study presented in this paper, we focus on a single behavior, eye gaze. We investigate whether it is possible to make an improvement to people's communication experience by inferring their avatar's eye movements from information readily available from the audio stream. We build on previous research conducted in a non-immersive setting [14] [17], where random eye gaze was compared with gaze that was inferred based on speaking and listening turns in the conversation. 1 CAVE ‰ is a trademark of the University of Illinois at Chicago. In this paper we use the term ‘Cave’ to describe the generic technology as described in [9] rather than to the specific commercial product. Ft. Lauderdale, Florida, USA  April 5-10, 2003 Paper: New Directions in Video Conferencing Volume No. 5, Issue No. 1 529