New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer Javier Latorre * , Koji Iwano, Sadaoki Furui Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, 2-12-1 8E-602, Ookayama, Meguro-ku, 152-8552 Tokyo, Japan Received 20 September 2005; received in revised form 10 May 2006; accepted 11 May 2006 Abstract In this paper we present a new method for synthesizing multiple languages with the same voice, using HMM-based speech synthesis. Our approach, which we call HMM-based polyglot synthesis, consists of mixing speech data from several speakers in different languages, to create a speaker- and language-independent (SI) acoustic model. We then adapt the resulting SI model to a specific speaker in order to create a speaker dependent (SD) acoustic model. Using the SD model it is possible to synthesize any of the languages used to train the SI model, with the voice of the speaker, regardless of the speaker’s language. We show that the performance obtained with our method is better than that of methods based on phone mapping for both adaptation and synthesis. Furthermore, for languages not included during training the perfor- mance of our approach also equals or surpasses the performance of any monolingual synthesizers based on the languages used to train the multilingual one. This means that our method can be used to create synthesizers for languages where no speech resources are available. Ó 2006 Elsevier B.V. All rights reserved. Keywords: Multilingual; Polyglot synthesis; Voice adaptation; Cross-language synthesis; Phone mapping 1. Introduction A side-effect of the globalization process is that speaking two or more languages has become a daily routine for many people. For business, to communi- cate with members of other linguistic groups in our own communities, or even to understand other mem- bers of our own families, the knowledge of more than one language has become a must for many of us. Historically, in such circumstances there has always been a language, e.g., English that became the ‘‘lin- gua franca’’. English is indeed the main language for business and international relationships, however there are other languages such as Spanish, Japanese and Chinese which are becoming more and more important (Graddol, 2004). Just think that more than 46 million people in the USA speak a language other than English at home (Shin and Bruno, 2003), and that in the European Union there are 20 official languages as well as additional languages that are official only inside some member states. In this scenario, it is expected that multilingual capability 0167-6393/$ - see front matter Ó 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.specom.2006.05.003 * Corresponding author. Tel.: +81 3 5734 3481; fax: +81 3 5734 3480. E-mail address: latorre@furui.cs.titech.ac.jp (J. Latorre). Speech Communication 48 (2006) 1227–1242 www.elsevier.com/locate/specom