Prosodic phrasing in a Polish text-to-speech system Morena Danieli*, Beata Dobrzyńska # , Alberto Pacchiotti * & Elena Cabrio * *Loquendo Voice Technologies, Torino, Italy # CELI Language and Information Technology, Torino, Italy Corresponding Author: M. Danieli (morena.danieli@loquendo.com) Abstract This contribution presents the linguistic research underlying the implementation of prosodic phrasing in a Polish text-to-speech system 1 . While in the past few years concatenative text-to- speech technology dramatically improved the acoustic quality of the synthesized voices, yet the naturalness and expressivity of present text-to-speech systems are still unsatisfactory. In particular, these systems usually read with a neutral intonation, and their prosody is not always natural- sounding. Some authors, for example Prevost (1996) and Hiyakumoto et al. (1997), showed that information structure in conjunction with domain semantics can be used to produce intonational patterns with appropriate variations in the prominence and type of pitch accents. However, their approach has two limitations. First, that approach is based on the availability of domain-dependent semantic knowledge, however that knowledge is not usually available for general-purpose text-to- speech applications, which in principle can read any type of text in any semantic domain. Secondly, the domain semantics approach does not consider discourse structure (Grosz & Sidner, 1986; Moser & Moore, 1996, Danieli & Bazzanella, 2002; Kruijff-Korbayová & Steedman, 2003). As Grosz & Hirschberg (1992) convincingly showed for English, discourse structure is marked intonationally, although the relationship between that structure and intonational features is a complex one. In developing a prosodic module for the Polish text-to-speech, we hypothesized that discourse structure is prosodically relevant in Polish, and that discourse markers are good cues for recognizing discourse structure in unknown texts. Those hypothesis have been tested on the basis of perceptual tests, and on the basis of the analysis of the annotations of a small textual corpus. In this paper we will report the results of the perceptual tests, which showed statistically significant associations between specific discourse structure and acoustic-prosodic features. This results was confirmed by the evaluation of the tagged corpus. The small textual corpus was tagged by a Polish native-speaker with respect to prosodic phrase boundaries. The linguistic analyses of the tagging showed that the labelings corresponding to discourse markers were tagged with short pauses, as well as the initial and final boundaries of parentheticals. The next step of our work included a corpus-based research for retrieving Polish discourse markers, along with their frequencies. The discourse markers that got lower ranks in the frequency distribution were considered for receiving special treatment by the prosodic phrasing parser (Gili-Fivela & Quazza, 1997). In short, that parser parallely computes two kinds of structures of the input text: a one-level syntactic chunking, and the prosodic phrasing. The lexically-diriven augmentation including discourse markers improved, as shown by the perceptual texts, the intellegibility and expressiveness of the Polish text-to-speech. The presentation will describe the methodology summarized above, and the related experimental results, with the aim of showing that in Polish, as well as in other languages, discourse markers are good predictors for prosodic phrasing. 1 Loquendo TTS ®