Silence and Discourse Context in Read Speech and Dialogues in Swedish Sofia Gustafson- ˇ Capkov´ a & Be ´ ata Megyesi Computational Linguistics, Department of Linguistics, Stockholm University, Sweden sofia@ling.su.se Centre for Speech Technology, KTH, Stockholm, Sweden bea@speech.kth.se Abstract In this study, we investigate the correlation between silent pauses and discourse boundaries in the notion of theme shift. We examine three speaking styles in Swedish: professional and non-professional reading, and elicited spontaneous dialogues. Considerable attention is given to the syntactic and discourse context in which pauses appear, as well as the characteristics of the discourse structure in terms of pauses. 1. Introduction During the last decade, researchers have shown an increasing interest in the relationship between prosody and discourse struc- ture. Many researchers have investigated this relationship for different languages in order to detect topic structure. Swerts and Geleukens [18] show that speakers in mono- logue use pauses of various length to signal information flow in terms of topic structure. Hirschberg [8] points out that features used for indicating topic structure in texts include speaking rate, duration of inter- phrase pause, loudness and pitch. She also reports, that phrases introducing a new topic are characterized by an initial wider pitch range preceded by a longer pause, as well as that they are louder and slower than other phrases. Shriberg et al. [15] have used a prosodic model for auto- matic topic segmentation, which performs equally well or bet- ter than word-based statistical language models. The authors also report that new topics are realized by some combination of silent pauses, low boundary tones and/or pitch range resets. The relation between prosody and discourse structure is also investigated by van Donzel [19]. She studied prosodic fea- tures of discourse boundaries for Dutch on the basis of clause, sentence and paragraph division, as well as the prosodic features of information structure in the New - Given taxonomy [13]. She reports that discourse boundaries in spontaneous speech are re- alized by silent pauses and boundary tones similarly to Shriberg et al., but with high boundary tones instead; The stronger the boundary, the more probable the combination of the two cues. Pauses often indicate prosodic phrase boundaries which highlight the organization of the message [1], [2], [6], [8], [11], [18]. Therefore, we have chosen to study pausing in various speaking styles and relate pausing strategies to discourse struc- ture. More specifically, the aim of our study is to investigate the discourse structure in terms of theme shift and its relation to pausing in three different speaking styles in Swedish: pro- fessional news announcement, non-professional reading, and elicited spontaneous dialogues. We analyze the materials from two different perspectives. First, we investigate the discourse position of pauses. Second, we study the discourse context it- self and the presence of pauses. The results from the former can be useful to predict discourse boundaries given audio data, and results from the latter might be useful for prediction of silent intervals in text-to-speech systems. In cases where discourse structure and silent intervals do not coincide, other types of linguistic information, such as part- of-speech and phrasal structure, might help in the prediction of discourse boundaries, and/or in the prediction of pauses in a text-to-speech system. In the next section, we will give a summary of our data and methodology, as well as a brief overview of the findings re- ported in our previous studies on the production and perception of pauses ([4], [9]). In section 3, the results on the correlation between discourse structure and pausing are presented. Lastly, in section 4, we conclude the results and suggest directions to future research. 2. Acoustic Pauses and Discourse Contexts In this study, we use the same speech data for each speaking style as we used in our previously reported studies (see [4] and [9]). The material of read speech consists of recordings of Swedish radio news [14] read by four professional and four non- professional readers. The spontaneous speech data [5] consists of recordings of two Swedish map task dialogues, each with two dialogue participants. The data sets consist of 920 words each. In order to investigate the duration, frequency, type and po- sition of acoustic pauses, the speech data was processed auto- matically by a pause detector. Silent intervals longer than or equal to 100 ms were defined as acoustic correlate for paus- ing. Pauses may include natural physical phenomena such as breathing and swallowing. However, particles expressing feedback/back-channelling (e.g. mmm, aaa, aha) in dialogues are not allowed inside pauses. The automatic detection was manually checked in order to obtain consistency. As mentioned in the first section, various studies have shown that prosody might signal discourse structure in terms of topic structure ([8], [18]). In these studies topic units can be seen as discourse segments. However, the nature of discourse segments is hard to define. In the literature a variety of features is used when describing discourse segment boundaries, such as cue words (e.g. [10]), referring expressions and intonation (e.g. [12]), among others. In this study, for a definition of a discourse segment we use the notion of theme. Theme is defined as a chunk with one un- derlying intention. In other words, a discourse segment is a se- Speech Prosody 2002 Aix-en-Provence, France April 11-13, 2002 ISCA Archive http://www.isca-speech.org/archive