LANGUAGE AND SPEECH, 1993,36(2,3), 331-351 33 1 ARTICULATORY REPRESENTATION AND SPEECH TECHNOLOGY* 0. SCHMIDBAUER,~ F. CASACUBERTA,~ M.J. CASTRO,~ C. HECERL,~ H. HOGE,a J.A. SANCHEZ,~ and I. ZLOKARNIK~ aSienieiis AG Miaiicli and Uitiversidad Politeciiica de Valeiicia In this paper we demonstrate the feasibility and usefulness of articulation-based approaches in two major areas of speech technology: speech recogriirion and speech syn- thesis. Our articulatory recogriition inodel estimates probabilities of categories of manner and place of articulation, which establish the articulatory feature vector. The transformation from the articulatory level to the symbolic level is performed by hidden hlarkov models or multi-layer perceptrons. Evaluations show that the articulatory approach is a good basis for speaker-independent and speaker-adaptive speech recognition. We are now working on a more realistic articulatory model for speech recognition. An algorithm based on an analysis by synthesis model maps the acoustic signal to 10 articulatory parameters which describe the position of the articulators. EhIA (electromagnetic articulograph) measure- ments recorded at the University of hlunich provide good initial estimtes of tongue co- ordinates. In order to improve articularory speech synthesis we investigated an accurate physiml model for the generation of the glottal source with the aid of a numerical simu- lation. This model takes into account nonlinear vortiml flow and its interaction with sound- waves. The simulation results can be used to improve the articulatory synthesis model developed by Ishizaka and Fhnagan (1972). Key words: automatic speech recognition, speech synthesis, articulatory modeling INTRODUCTION Speech recognition, speech synthesis, low-bit speech coding, and speaker verification are important issues in current research on future man-machine interfaces. Speech technology has reached a level which is sufficient for many simple applications. However, considerable research is still necessary to cope with the challenges of robust continuous speech recognition and natural sounding speech synthesis. hlost of today’s speech recognition systems use spectral parameters as a link between the speech signal and the symbolic (phonetic) level. In this spectral approach the relation between the speech signal and the spectral parameters is very close whereas the link to the symbolic level is problematic. In this approach the symbolic information is encoded in a highly variable manner, since the spectral realization of the phonetic symbols varies * This work was sponsored by the ESPRIT project ACCOR, basic research action 3279. Address correspondence to Otto Schmidbauer, Siemens AG, P.O. Box 830953,D-8000 hlunich 83, FRG.