Can a spoken dialog system be used as a tool to study convergence? José Lopes 1,2 , Andrew Fandrianto 3 , Maxine Eskenazi 3 and Isabel Trancoso 1,2 1 Instituto Superior Técnico, Lisboa, Portugal 2 INESC-ID Lisboa, Portugal 3 Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA jose.david.lopes@l2f.inesc-id.pt Abstract The finding that people entrain to one another in a conversation Brennan (1996) has fostered much interest in this phenomenon within a variety of research communities, such as psychology. Members of the automatic speech processing community have viewed it as a potential functionality that, if present in human-machine interaction, could be capitalized upon to improve system performance Lopes et al (2011). Beyond the benefits to SDS research, we argue here that automated systems can, in turn, benefit research in other areas. We believe that, for the study of entrainment, SDS can provide platforms on which to run studies, offering more control over conditions in some ways that do human-human studies. We use the term entrainment here, from Brennan and others. This term may represent the action of one of the speakers. Assuming both speakers entrain, there should be convergence. The literature does show that humans can be made to change their speech patterns to imitate the output of a spoken dialog system (SDS). Stoyanchev and Stent (2009) used a set of dialogs to study entrainment using two verbs and two prepositions as primes. They confirmed that callers can adapt their choice of terms to the terms used by the automated system. In Parent and Eskenazi (2010) the system primes were directly manipulated in the Let's Go spoken dialog system (real, not paid callers, Raux et al (2005)) and observed caller adaptation over time. The authors found that users do adapt and are more likely to do so in the first few turns following the first appearance of the prime. The same was done in European Portuguese with the Noctívago spoken dialog system Lopes et al (2011) with the result that the enlisted callers entrained to all of the primes that were proposed by the system. Despite confirming the presence of entrainment, not all proposed primes were copied. Looking in more detail on the lexical and prosodic levels, both Lopes et al (2011) and Parent and Eskenazi (2010) found differences in how often words were copied. Less frequent words were copied less frequently if they were new primes (and the system was already offering a very frequent prime, for example, “help” > “assistance”) and that conversely, if an infrequent word had been used and was replaced with a more frequent prime, the latter was easily copied (“start a new query” to “start a new request”). Lopes et al (2011) observed in Noctívago that if a very frequent and contextually appropriate word had been used (like "agora", now) it would continue to be used whether the system still used it in its prompts or not. But the primes proposed here, for example “imediatamente” (immediately) and “neste momento” (right now), are all much longer and not necessarily more natural than “agora”. Neither study found any influence of the part of speech on the likelihood to be copied. Both studies confirmed that continued exposure to the primes increases the likelihood of their uptake. The individual choices may not, for some words, follow the lexical frequency in the language. This can be due to individual preference, local uses, professional uses or a myriad of other reasons. In a relatively short dialog, like