Study of the Phenomenon of Phonetic Convergence thanks to Speech Dominoes Amélie Lelong & Gérard Bailly GIPSA-Lab, Speech & Cognition dpt., UMR 5216 CNRS/Grenoble INP/UJF/U. Stendhal, 38402 Grenoble Cedex, France {amelie.lelong, gerard.bailly}@gipsa-lab.grenoble-inp.fr Abstract. During an interaction people are known to mutually adapt. Phonetic adaptation has been studied notably for prosodic parameters such as loudness, speech rate or fundamental frequency. In most of the cases, results are contradictory and the effectiveness of phonetic convergence during an interaction remains an open issue. This paper describes an experiment based on a children game known as speech dominoes that enabled us to collect several hundreds of syllables uttered by different speakers in different conditions: alone before any interaction vs. after it, in a mediated interaction vs. in a face-to-face interaction. Speech recognition techniques were then applied to globally characterize a possible phonetic convergence. Keywords: face-to-face interaction phonetic convergence, mutual adaptation  Introduction The Communication Adaptation Theory (CAT), introduced by Giles et al [1], postulates that individuals accommodate their communication behavior either by becoming much closer of their interlocutor (convergence) or on the contrary by increasing their differences (divergence). People can adapt to each other in different ways. For example, conversational partners notably adapt to each other‟s choice of words and references [2] and also converge on certain syntactic choices [3]. Zoltan- Ford [4] has shown that users of dialog systems converge lexically and syntactically to the spoken responses of the system. Ward et al [5] demonstrated that adaptive systems mimicking this behavior facilitate learning. This alignment [6] may have several benefits such as easing comprehension [7], facilitating the exchange of messages of which the meaning is highly context-dependent [8], disclosing ability and willingness to perceive, understanding or accepting new information [9] and maintaining social glue or resonance [10]. Researchers have examined also adaptation of phonetic dimensions such as pitch [11], speech rate [12], loudness [13], dispersions of vocalic targets [14] as well as more global alignment such as turn-taking [15]. But the results of these different studies show a weak convergence and even in some cases no convergence at all. In the perceptual study conducted by Pardo [16], disparities between talkers have been hal-00603164, version 1 - 24 Jun 2011 Author manuscript, published in "Analysis of Verbal and Nonverbal Communication and Enactment: The Processing Issue, A. Esposito, A. Vinciarelli, K. Vicsi, C. Pelachaud and A. Nijholt (Ed.) (2011) 280-293"