Development of a Prosodic Database for an Argentine Spanish Text to Speech System Jorge Gurlekian LIS-CONICET Buenos Aires. Argentina. jag@fmed.uba.ar Hernán Rodríguez Universidad Nacional de La Plata. 7 entre 47 y 48. 1900 La Plata. Argentina. hernan292@uol.com.ar Laura Colantoni LIS-CONICET Buenos Aires. Argentina. lcolanto@hotmail.com Humberto Torres LIS-CONICET Buenos Aires Argentina. htorres@hotmail.com Abstract This project involved the design and development of a relational SQL-based database to generate an intonational model for an Argentine Spanish text to speech system. The first stage in the population of the database involved the massive loading of text, divided into three co-indexed files: sentences, orthographic words and phonological syllables. A software tool, which performed phonemic transcription and syllabic segmentation of the text, was developed to allow indexation. In the beginning, a large set of sentences was loaded, then, a subset of 741 sentences was selected, according to criteria related to syllable occurrences in all positions in words with and without stress. This set contained 97% of all Spanish syllables extracted from a widely used Spanish dictionary. The utterances were recorded at 16 kHz / 16 bits, using an interactive program. Two professional announcers were instructed to generate a variety of accent patterns and intonational phrases to prevent monotony. Speech signals were then labeled by trained phoneticians with a spectral analysis tool, using ToBI tiers (Beckman & Ayers, 1994). ToBI conventions were adopted to account for the prosodic patterns of Argentine Spanish. Frequency values were scaled using the ERB scale, and bitonal accents were redefined. Finally, the second stage of the population of the database consisted of the incorporation of the labeled files. Waveforms were kept outside the database, but linked to it, to allow the identification, reproduction, and selection of specific segments. 1 Introduction The general goal of this project was to design and implement the first database to train an Argentine Spanish text to speech system (TTS). Current TTS systems have a medium intelligibility at the segmental level and a low quality at the suprasegmental level (rhythmical and intonational patterns). Our proposal involves the study of prosodic patterns in order to achieve high naturalness in addition to intelligibility. Prosodic characteristics were defined, labeled and stored in order to train the TTS system automatically. Several tools were developed to perform different tasks such as corpus creation, recording and labeling. The following steps were involved in the construction of the database: creation and loading of a sentence corpus representative of Argentine Spanish; recording of the corpus; and labeling and loading of the files.