Disfluencies in Polish and Thai task-oriented dialogue Janusz Kleśta, Maciej Karpiński Adam Mickiewicz University ul. Międzychodzka 5, 60-371 Poznań, POLAND {janusz.klesta|maciej.karpinski}@amu.edu.pl ABSTRACT The present paper discusses the problem of addressing disfluencies contained in spontaneous speech corpora. Major types of disfluencies found in task-oriented dialogue recordings for two structurally different and geographically distant languages — Polish and Thai — are named, exemplified and tentatively categorized. 1. Introduction Corpora of spontaneous speech provide valuable material for both fundamental linguistic research and the design of computer dialogue systems. In practise, however, the analysis of their content poses serious theoretical and technical problems of various character. Numerous phenomena observed within the resources of spontaneous speech corpora arouse, for example, serious controversies as regards their relationship to the language system. These include repetitions, hesitations, pauses and other natural components of spontaneous utterances [1], which may often not only provide important background for the correct interpretation of the message conveyed but also constitute its vital part themselves [2]. Moreover, certain types of such disfluencies in speech signal are even produced intentionally and utilized as stylistic devices in conversational exchange. Their close analysis is, therefore, extremely important not only from the theoretical point of view but also for practical reasons, as synthetic speech deprived of these inherent dialogue elements will definitely lack naturalness and may be