Proceedings of 6 th International Conference on Spoken Language Processing, vol. III, pp. 267-270. Beijing, China. 2000 DESIGN AND IMPLEMENTATION OF A GREEK TEXT-TO-SPEECH SYSTEM BASED ON CONCATENATIVE SYNTHESIS Costas Christogiannis, Yiannis Stavroulas, Yiannis Vamvakoulas, Theodora Varvarigou, Agatha Zappa Telecommunications Laboratory Department of Electrical and Computer Engineering National Technical University of Athens 9 Iroon Polytechniou, 15773, Athens, GREECE Chilin Shih Speech Synthesis Research Department Bell Laboratories, Lucent Technologies 700 Mountain Avenue, Murray Hill, NJ, USA, 07974 Amalia Arvaniti Department of Foreign Languages and Literatures University of Cyprus, P.O. Box 20537, Nicosia 1678, CYPRUS. ABSTRACT The goal of this paper is to present the work carried out up to now for the development of the Greek Text-To-Speech (GRTTS) system by NTUA. The system under consideration is based on the method of concatenative synthesis and follows the Bell Labs approach to this technique. In order that the input text to the GRTTS is translated into continuous synthetic speech the following modules have already been studied and implemented: (i) module for the linguistic analysis of the input text; (ii) the acoustic inventory module. On the same time it is under development the duration module of the GRTTS, for the computation of the appropriate temporal structure of synthesized speech. The objectives of the above studies, in combination with the concatenative synthesis technique, which is one of the simplest methods for speech synthesis, are to bypass most of the problems encountered by other synthesis methods such as articulatory and formant synthesis systems. The major objective is to minimize abrupt discontinuities and thus maximize the naturalness of the synthesized utterances. 1. INTRODUCTION The TTS system for modern Greek (GRTTS) is based on a modular architecture developed by Bell Labs [1]. The overall system can be seen as a pipeline comprising a number of modules, where each module handles a discrete stage of the TTS process:  The Transcription Module consists of three processing steps. The Lexical Analysis step receives the raw input text and performs such tasks as classification of the words into grammatical categories, expansion of abbreviations, numerals, dates etc, and syntactic analysis. The Transcription step handles the actual transcription into phonetic representation. The Prosodic Formatting step is concerned with the prosodic formation of the sentences, the application of sandhi effects and the syllabification of the transcribed text.  The Duration Module computes the duration of the phones, on the basis of a number of factors, such as stress and/or their syllabic position.  The Intonation Module determines the intonational contour of the sentences.  Finally, the Synthesizer Module receives the augmented phonetic transcript and converts it to speech produces the synthesized speech waveform, from the glottal source and other parameters. In the present paper we describe (i) the development of the transcription first module of the GRTTS that performs the linguistic analysis of an input text in Modern Greek; (ii) the design and the construction of the acoustic database to be incorporated in the synthesizer module of the GRTTS and (iii) the progress on the study of duration module. In Section 2 we define the phones for Modern Greek. In Section 3 we present how morphological analysis of the input text is performed and we describe the finite state transducers (FSTs) developed for the morphological analysis and handling of the various categories of Greek words such as abbreviations, dates, numerals and ordinals. Section 4 describes the selection of specific diphone segments as the elementary speech units for our inventory. In Section 5 we present information relative to the duration modeling in order to assign to each phoneme, taking into account various contextual factors. Finally our conclusions are summarized in Section 6.