REVISTA DO DETUA, VOL. 1, N ◦ 1, JANEIRO 2006 1 Machine Learning of European Portuguese Grapheme-To-Phone Conversion using a Richer Feature Set António Teixeira, Catarina Oliveira 1 and Lurdes Castro Moutinho 1 1 Centro de Línguas e Culturas, Universidade de Aveiro Abstract – In this study evaluation of two self-learning meth- ods (MBL and TBL) on European Portuguese grapheme-to- phone conversion is presented. Combinations (parallel and cascade) of the two systems were also tested. The usefulness of using syllable related information in machine learning ap- proaches is also investigated. Systems with good performance were obtained both using a single self-learning method and combinations. Best performance was obtained with MBL and the parallel combination. The use of syllable information con- tributes to a better performance in all systems tested, being the effect significant statistically. Our best machine based systems present Word Error Rate and Mean Normalized Levenshtein Distance similar to those recently obtained for German when using similar features. Resumo – Neste trabalho, são testados dois métodos de apren- dizagem automática (MBL e TBL), bem como combinações destes métodos (em paralelo e em cascata), aplicados à tarefa de conversão grafema-fone do Português Europeu. É ainda investigado o interesse em utilizar informação silábica neste tipo de abordagem automática. Os melhores resultados são alcançados com o MBL e uma combinação dos dois métodos em paralelo. Em todos os sistemas testados, a inclusão de in- formação relativa à sílaba contribui para uma melhoria do desempenho, sendo a diferença estatisticamente significativa. Os sistemas com desempenhos mais elevados apresentam uma taxa de erro e uma Distância de Levenshtein similar à recen- temente obtida para o Alemão, usando os mesmos modelos de treino. Keywords – Grapheme-to-Phone, Portuguese, Machine Learning, MBL, TBL, Syllable. Palavras chave – Conversão Grafema-Fone, Português, Aprendizagem Automática, MBL, TBL, Sílaba. I. I NTRODUCTION Phonetisation, i.e., conversion of graphemes to a set of phones, poses some well-known problems, since there isn’t a perfect correspondence between graphemes and their oral realization. As most of the work in European Portuguese (EP) grapheme-to-phone (g2p) conversion, we already explored the rule-based approach with good but not perfect results. Being considered by many researchers the data-driven ap- proach as capable of better results, at least for some lan- guages with a complex relation between pronunciation and spelling, we considered worth trying this complementary approach to EP. This paper describes the development of EP g2p conver- sion modules based on machine learning methods. We in- vestigated both the use of Memory Based Learning (MBL), Transformation Based Learning (TBL) and hybrid ap- proaches. Following recent results on the use of richer feature sets to improve machine learning systems, namely the use of syl- lable [1] and morphologic [2] information, we, also, tested the impact on systems’ performance of using syllable infor- mation. This effort was possible due to the availability of an automatic syllabification procedure based on orthographic input [3]. The paper is structured as follows: the next section sum- marizes work on European Portuguese g2p and recent de- velopments in the area; section III describes our systems based on machine learning; next two sections present our evaluation, relevant results and a brief discussion; the last section presents the conclusions. II. GRAPHEME- TO-PHONE CONVERSION ( G2P) A. Portuguese g2p Several approaches have been adopted over the years for grapheme-to-phone conversion for European Portuguese, specially (but not exclusively) in the scope of DIXI system, the first text-to-speech system specifically designed from scratch for Portuguese, developed by the speech process- ing group of INESC in cooperation with the CLUL pho- netic group. The first version of this system [4], based on the Klatt’s formant synthesizer, comprises a rule-based g2p conversion module, with about 200 rules, basically the same as proposed in CORSO I [5]. Later, the rule-based approach for letter-to-phone conver- sion was compared with two self-learning methods, one based on a multi-layered neural network and another based on table look-up [6]. Despite the fairly good results of neu- ral networks, the classical rule-based method has shown a better performance. The table look-up approach did not yield very good results. The second version of the synthesizer (now designated as DIXI+) integrates an ap- proach based on CART’S (Classification and Regression Trees) [7]. Recently, other g2p approaches (rule-based, data-driven and hybrid approaches) have been implemented as Weighted Finite State Transducers [8]. Best results were obtained with the rule-based approach. The WFST’s based rule approach was also compared with the previous rule- based DIXI system and both methods achieved similar re- sults. The FST-based grapheme-to-phone module devel- oped for EP was later ported to the other official language