Talking Avatar for Web-based Interfaces José Nunes, Luís Sá and Fernando Perdigão Instituto de Telecomunicações Coimbra, Portugal {josenunes, luis, fp}@co.it.pt Abstract In this paper we present an approach for creating interactive and speaking avatar models, based on standard face images. We have started from a 3D human face model that can be adjusted to a particular face. In order to adjust the 3D model from a 2D image, a new method with 2 steps is presented. First, a process based on Procrustes analysis is applied in order to find the best match for input key points, obtaining the rotation, translation and scale needed to best fit the model to the photo. Then, using the resulting model we refine the face mesh by applying a linear transform on each vertex. In terms of visual speech animation, we have considered a total of 15 different positions to accurately model the articulation of Portuguese language  the visemes. For normalization purposes, each viseme is defined from the generic neutral face. The animation process is visually represented with linear time interpolation, given a sequence of visemes and its instants of occurrence. Keywords: talking heads; speech animation;model adjustment; I. INTRODUCTION Human-computer interaction plays an increasingly important role on todays computer systems. The use of virtual animated characters on current digital support systems can greatly benefit user experience and interaction. These virtual characters, or simply avatars, can be applied on a wide range of applications for entertainment, personal communications, commerce, or education [1]. Along with the development of new web standards and technologies, today it is possible to deploy standard computer graphics applications on Internet environments, keeping good balance between visual quality and latency [2]. The solution we present in this work is primarily a system for avatar creation and animation, which can be deployed on web platforms. Although there are several approaches concerning model presentation, i.e. [1, 3] and model deformation, [4, 5, 6], this work presents a new approach for avatar creation, which is achieved only with a photo with no special requirements and no previous learning. II. SYSTEM OVERVIEW In this section we present a general view about the avatar framework and how it can be used in order to enhance web interface systems. As a distributed application in client-server architecture, there is a Graphical User Interface (GUI) that includes visual models, media and animation, and a server which provides a range of services. Thus, for applications using avatar, it is adopted the communication model shown on Fig. 1. In this model, one or more clients can be connected simultaneously, accessing a webpage where the avatar is presented. This site is the GUI that will give users access to visual information, such as models, images and their animation, to media which is basically audio playback, and to interaction, that is the set of interactive user controls, such as buttons and toolboxes. This application relies on services provided by a remote server. These services could be related in this case to audio synthesis, phonetic transcription and viseme conversion, semantics and adjustment algorithms. Figure 1. System Communication using Avatar Apart from the main application, there is also a program to create users own avatars from a face image. This is a client- server application, which has a GUI for choosing and adjusting points on a photo, and web services for model computation. III. BASE MODEL In order to represent all types of faces, regardless of gender, race or age, we have modeled a generic human head model, which resulted on a simplified representation of the face region, with always the same depth information. Topologically, this model consists on a polygon mesh with 152 connected vertices, forming 276 triangular faces. Since all human faces are nearly symmetric, first we have adjusted and simplified one side, and then the other side was obtained by mirroring. The mouth region, used for speech animation has 23 vertices. We have independent models for eye, tongue and teeth. The model is shown on Fig. 2 from 3 views. Figure 2. Base model polygonal mesh