Hungarian Talking Head László Czap University of Miskolc, Department of Automation, czap@mazsola.iit.uni-miskolc.hu, János Mátyás North Hungarian Training Centre 3518 Miskolc, Erenyő u. 1. matyasj@mail.erak.hu Facial animation has progressed significantly over the past few years and a variety of algorithms and techniques now make it possible to create highly realistic characters. Based on the author’s speechreading study and the development of 3D modelling, a Hungarian talking head has been created. Our general approach is to use both static and dynamic observations of natural speech to guide the facial modelling. Evaluation of Hungarian consonants and vowels is presented for classifying visemes - the smallest perceptible visual units of the articulation process. A three level dominance model has been introduced that takes coarticulation into account. Each articulatory feature has been grouped to dominant, flexible or uncertain classes. Analysis of the standard deviation and the trajectory of the features served the evaluation process. The acoustic speech and the articulation are linked with each other by a synchronising process. A filtering and smoothing algorithm has been developed for adaptation to the tempo of either the synthesized or natural speech. Basic emotions defined by Ekman can be expressed in a scalable manner. Eye blink, gaze, head movement, and eyebrow rising can be controlled by randomly or through special commands. 1 Introduction The intelligibility of speech can be improved by showing the articulation of the speaker. This visual support is essential in noisy environment and for hearing impaired people. An artificial talking head can be a natural supplement to the sophisticated acoustic speech synthesis. The pioneer work of face animation for modelling the articulation started about two decades ago. The development of 3D body modelling, the evolution of computers and the advances at the analysis of human utterance enabled the development of realistic models. Since the last decade the area has been developing dynamically and more and more applications have appeared. Teaching hearing impaired people to speak can be aided by an accurately articulating virtual speaker, which can make its face transparent and show the details of the utterance better than a human speaker. Figure 1: Photorealistic and transparent visualization The audio-visual speech recognition and synthesis can open up a new prospect in the human-machine interface. Virtual speakers and actors can improve the freedom of artists in multimedia applications. 2 Speech animation The first visual speech synthesizers were based on a 2D head model, recalling beforehand stored images of a speaker. Phases between keyframes sometimes were produced by image morphing. A 2D model can hardly provide head movements, gestures and emotions. The progress at solid modelling directed the researchers’ interest to the three-dimensional modelling. Either type of 3D models simulates facial expressions by tensing muscles. They produce realistic results, but the analysis of real muscular tensions is difficult. Surface models seem to be promising by acting textured polygons. Their features can be analysed on human speakers. 2.1 The visual unit of speech The visual parallel of shortest acoustic unit, a phoneme is called viseme. The set of visemes has fewer elements than that of phonemes as utterances of several phonemes are visually the same. E.g. the voiced quality is invisible and the voices of the same place of articulation that are different only in duration or intensity belong to the same viseme class. The static positions of the speech organ for Hungarian phonemes can be found in essential publications. Figure 2 shows 2655