A Realistic Video Avatar System for Networked Virtual Environments Vivek Rajan, Satheesh Subramanian, Damin Keenan Andrew Johnson, Daniel Sandin, Thomas DeFanti Electronic Visualization Laboratory University of Illinois at Chicago, Chicago, IL, USA vrajan@evl.uic.edu Abstract With the advancements in collaborative virtual reality applications there is a need for representing users with a higher degree of realism for better immersion. Represent- ing users with facial animation in an interactive collabo- rative virtual environment is a daunting task. This paper proposes an avatar system for a realistic representation of users. In working towards this goal, this paper will present a technique for head model reconstruction in tracked envi- ronments, which is rendered by view dependent texture map- ping of video. The key feature of the proposed system is that it takes advantage of the tracking information available in a VR system for the entire process. 1. Introduction The word avatar, inspired from hindu mythology, stands for an incarnation or embodiment of human form. In the context of the virtual reality environment, an avatar is a graphical representation of the human form. The human face is endowed with a myriad of gestures which need to be represented in a avatar in order to impart it with realism in the context of a networked virtual environment. With continuing research and new technologies, combined with growing expectations, there is a need for avatars that are more human-like. In this paper, we describe how view dependent texture mapping can be used to produce a realistic head of an avatar, and eliminating in the process the constraints posed by background and lighting requirements. The process es- sentially involves projecting real-time video images on a 3D model of the user’s head to render the head of the avatar. This paper also presents an automated image based model generation technique for reconstructing the head model of a user, making the system cost effective and usable. 2. Previous Work A large amount of research has gone into represent- ing users with facial animation. In general, the methods for representing facial expression fall into two categories. One method is to extract various facial parameters from the video and use the parameters to animate a model. The sec- ond method is to use the video directly by texture mapping the video onto some model. It is observed that video di- rectly used for rendering avatars achieves higher degree of realism in comparison to a generic model that may have been modiﬁed using extracted parameters from the video data. This section will review some of the signiﬁcant meth- ods. One of the previous methods in this area was video tex- turing of the face [1]. This technique uses a video image of the user’s face as a texture map onto a simple model. The background image captured by the camera is processed to extract the subset of the image containing the user’s face using a simple background subtraction algorithm. The tex- ture mapping is done using a simple frontal projection. This technique is a compromise between mapping on a simple shape (e.g. box, ellipsoid) which would give unnatural re- sults and mapping on a full-featured human head model where more precise image-feature alignment would be nec- essary. In model-based coding of facial expressions, as in the previous technique, the user’s head and shoulder video im- age is captured by a camera, but instead of transmitting whole facial images the images are analyzed and a set of parameters describing the facial expression are extracted [3] [9]. This method can be used in combination with texture mapping. The model needs an initial image of the face to- gether with a set of parameters describing the position of the facial features within the texture image in order to ﬁt the tex- ture to the face. Once this is done, the texture is ﬁxed with respect to the face and does not change, but it is deformed together with the face. This differs from the approach dis-