2012 IEEE 27-th Convention of Electrical and Electronics Engineers in Israel AbstractIn the present research, we explore several methods for transforming phoneme models from a language with acoustic models that have been trained (source language) to another, untrained language (target language). One approach uses acoustic distance-measures to automatically define the mapping from source to target phonemes. This is achieved by training basic models for the target language using a limited amount of training data and calculating the distance between the source models and target models. Naturally this approach requires some data from the target language. Another approach, which also requires some data from the target language, is to use acoustic adaptation for augmenting the source language acoustic models to better match the acoustic properties of the data in the target language. Phoneme recognition results of these approaches are compared to a reference recognizer that is well-trained on the target language. I. INTRODUCTION As Automatic Speech Recognition (ASR) based services become ubiquitous, there is an increasing need to support more and more languages. One of the main challenges this introduces is the need to overcome the lack of representative language resources (speech and text databases) available in many languages. In order to resolve this problem, accessible and well-trained models from other languages are sometimes used for training and recognition in the under-resourced language. This paper presents on-going work in the context of using phonetic search for a keyword-spotting application for under- resourced languages. In our research, we explored several methods for transforming phoneme models from a language with well- trained Acoustic Models (AM), referred to as a source language, to another, untrained language (target language). The aim of such a process is to find a transformation between the source-language phonemes and the target-language phonemes using only a small amount of speech data. The set of source acoustic models can comprise a single language (monolingual) or two or more languages (multilingual). A monolingual source model-set was studied in [9], and a multilingual source model-set was explored in [11][12][14][16]. In this paper we describe our current work using monolingual source models. A major factor in choosing which method to use for the source to target transformation is the amount of speech data that is available in the target language. When no speech data is available, the primary approach is to define a phonetic mapping from the phoneme set of the source language to the phoneme set of the target language, based on similar phonetic properties between phonemes [9][13]. When some speech data is available, other methods become relevant. Here we explore two such methods the first is to use the (limited) target speech data to train course acoustic models for the target language and use them to calculate the distance between each pair of target and source phonemes. This distance matrix can then be used to automatically generate the phonetic mapping, thus alleviating the need for expert knowledge. This method is described in section II B. The second method, described in section II C, is to use the target language data for learning an acoustic transformation that will adapt the source language acoustic model to better match the target language data. II. METHODS A. Languages and resources Three languages were investigated in this study: American English and Levantine Arabic as source languages and Spanish as the target language. The phonemic inventory of each language was set according to the following: English 39 phonemes based on the DARPA phonetic alphabet [1]; Arabic 38 phonemes based on the Buckwalter Transliteration [2]; Spanish 31 phonemes based on SAMPA [3]. Results are also compared to standard monolingual acoustic modeling in Spanish (trained on 80 hours of speech). Speech databases that were used for this work consist of the following: American English Macrophone [4] that contains a collection of read sentences; Levantine Arabic - Levantine Arabic Conversational Telephone Speech [5] and Fisher Levantine Arabic Conversational Telephone Speech [6]; Spanish - SpeechDat(II) FDB-4000 [7]. Cross-Language Phoneme Recognition for Under-Resourced Languages Noam Lotner, Ella Tetariy, Vered Silber- Varod, Vered Aharonson, Ami Moyal ACLP Afeka Center for Language Processing Afeka Academic College of Engineering 218 Bney Efraim Rd. Tel Aviv 69107 Yossi Bar-Yosef, Irit Opher, Ruth Aloni-Lavi NICE systems 22 Zarhin Street P.O. Box 4122 Ra'anana 43622