ANNOTATION SPECIFICATIONS OF A DIALOGUE CORPUS FOR MODELLING PHONETIC CONVERGENCE IN TECHNICAL SYSTEMS Grażyna Demenko, Jolanta Bachan Institute of Linguistics, Adam Mickiewicz University in Poznań, Poland lin@amu.edu.pl, jolabachan@gmail.com Abstract: The present paper describes spoken dialogue corpus creation and its an- notation specification for analysis and objective evaluation of phonetic convergence in human-human communication. The analysis of the corpus will serve for creation of convergence models which could be implemented in spoken dialogue systems based on spontaneous, expressive speech. The corpus consists of 13 hours of dia- logues between 16 pairs of Polish native speakers and controlled dialogues with a teacher. The speakers knew each other and were at similar age, but during the record- ing could not see each other. In each recording session the pair of speakers conducted 4 dialogues in neutral scenarios and 6 dialogues in expressive scenarios, 3 dialogues with the teacher, 2 repetition tasks and 1 reading, which provided about 1 hour of speech for each pair. The corpus is being annotated on several layers: orthographic transcription of text, prosody, noise, flow of speaking turns, dialogue acts, agreement and disagreement intervals, extraordinary events and speakerʼs attitude. This scen- arios combination and annotation specifications are novel, and promise to provide an empirical foundation for both linguistic and computational dialogue modelling of both face-to-face and man-machine dialogue. The results of preliminary analyses were used for selection of recording scenarios for German speakers. The next step of the ongoing project is to record dialogues between Polish L1 speakers with German L1 and Polish L2 speakers. 1 Introduction Phonetic convergence in a dialogue is a natural phenomenon. The notion of phonetic conver- gence is related to the Communication Accommodation Theory (CAT) which regards interper- sonal conversation as a dynamic adaptive exchange that was established in the 1970s [7, 8]. Phonetic convergence in dialogue involves adaptation of segmental and suprasegmental features of speech to those of the interlocutor, with the function of cooperatively or manipulatively signalling social common ground [10]. The main assumption of this theory is that interpersonal conversation is a dynamic adaptive exchange involving both linguistic and nonverbal behaviour between two human interlocutors. The phenomenon of inter-speaker ac- commodation in spoken dialogues is well-known in psycholinguistics, communication and cognitive sciences [6]. The features that undergo accommodation include lexical, syntactic, prosodic, gestural and postural features, as well as turn-taking behaviour [11]. The function of inter-speaker accommodation is to support predictability, intelligibility and efficiency of com- munication, to achieve solidarity with, or dissociation from, a partner and to control social im- pressions. The significant role of such adaptive behaviour in spoken dialogues in hu- man-to-human communication has important implications for human-computer interaction. In the context of speech technology applications, communication accommodation is important for a variety of reasons: models of convergence can be used to improve the naturalness of synthesised speech (e.g. in the context of spoken dialogue systems, SDS), accounting for ac- commodation can improve the prediction of user expectations and user satisfaction/frustration in real time (in on-line monitoring) and is essential in establishing a more sophisticated inter- 75