Umeå University, Department of Philosophy and Linguistics PHONUM 9 (2003), 117-120 Available online at http://www.ling.umu.se/fonetik2003/ The acoustic and morpho-syntactic context of prosodic boundaries in dialogs Mattias Heldner and Beáta Megyesi Department of Speech, Music and Hearing, KTH This study investigates the structuring of speech in terms of prosodic boundaries. In particular, the relation between boundaries as perceived by listeners, and their acoustic and linguistic realizations as uttered by speakers is examined. 1. Introduction The structuring of speech in terms of prosodic boundaries is fundamental for spoken communication. By reflecting the speakers’ internal organization of the information, prosodic boundaries facilitate the listeners’ processing of the message. This study is viewed as a step towards a general model of the structuring of speech with applications in speech technology; e.g. to predict prosodic boundaries from input texts for speech synthesis, to produce natural sounding boundaries in synthetic speech, and to predict boundaries from input speech for automatic speech recognition and understanding. To arrive at such a model, however, several kinds of information have to be taken into account. Perceptual classifications of prosodic boundaries in a speech material and detailed acoustic and linguistic descriptions of these boundaries and their context will be required. Each type of information has its own problems and limitations. For example, although most researchers agree that several boundary strengths must be assumed, there is no general agreement on issues such as the number and types of boundaries that need to be distinguished. This is perhaps reflected in the multitude of prosodic transcription systems available; several different systems have been proposed for Swedish (e.g. Bruce, 1995; Horne, Strangert & Heldner, 1995). Moreover, an extensive literature has shown that phenomena such as silent pauses, final lengthening and F0 resets are involved in the acoustic signaling of prosodic boundaries in Swedish (Bruce, Granström, Gustafson & House, 1993; Fant & Kruckenberg, 2002). Capturing such phenomena automatically in real-world speech, however, is a non-trivial task. Furthermore, it is also known that prosodic and linguistic structures are related (e.g. Strangert, 1990; Gustafson-Capkova & Megyesi, 2002), and prosodic boundaries in TTS systems are often predicted on the basis of content/function words, part-of-speech (PoS), or phrase structure (Ostendorf & Veilleux, 1994; Taylor & Black, 1998). Yet, we need to further explore what kind of linguistic features and the detail of analysis needed for making correct predictions about prosody. In this paper, we investigate weak and strong perceived boundaries and their acoustic and linguistic context in spontaneous dialogs in Swedish. At present, the acoustic features reflect silent pauses and final lengthening. The linguistic features include information about content and function words, and parts-of-speech with and without subcategorization features.