Talk is silver, silence is golden: A cross cultural study on the usage of pauses in speech Birgit Endrass Matthias Rehm Elisabeth André University of Augsburg Eichleitnerstr. 30 D-86135 Augsburg Germany {endrass|rehm|andre}@informatik.uni-augsburg.de Yukiko I. Nakano Tokyo University of Agriculture and Technology 2-24-16 Nakacho, Koganei-shi, Tokyo 184-8588, Japan nakano@cc.tuat.ac.jp ABSTRACT In this paper we examine the usage of pauses in speech. Thereby we concentrate on cultural differences with the aim to build a computational model for virtual agents later. By adapting the agents’ conversation management behavior to cultural background, we hope to get a better acceptance in a given culture. Therefore we have a closer look at the occurrence of pauses in speech with their features like length or emplacement. To ground our model in empirical data, we analyzed the occurrences of pauses in speech in the CUBE-G video corpus, recorded in the two participating cultures Germany and Japan. In a preliminary study we observed the number of pauses that occurred in videos of approximately five minutes duration. First we took into account pauses that lasted for more than 1 second and later only those out of them that lasted for over 2 seconds. By comparing the two cultures, we found out that Japanese subjects used significantly more pauses for both lengths than German subjects. Author Keywords Embodied conversational agent, Pauses in speech, cross- cultural communication ACM Classification Keywords H.5.2 [Information interfaces and presentation (e.g., HCI)]: User Interfaces— interaction styles, Natural language, Theory and methods; INTRODUCTION Knapp and Vangelisti [11] examine personal relationships and their impact on interpersonal communication. For describing the possibility of deepening a friendship between males by using silence, they cite Roger Rosenblatt, who wrote an article for the Time Magazine called “The Silent Friendship of Men”: (…) Older Story: Wordsworth goes to visit Coleridge at his cottage, walks in, sits down and does not utter a word for three hours. Neither does Coleridge. Wordsworth then arises and, as he leaves, thanks his friend for a perfect evening. (…) Would the same “conversation” have taken place if Mrs. Wordsworth and Mrs. Coleridge would have met? Or, if Wordsworth and Coleridge never met before? There are differences in the usage of silence in speech. But where do they come from? Some are evoked by gender or age, others by personal relationships. The utilization of pauses also varies across cultures. We want to use tendencies about the frequency of pauses in speech, described in literature and confirmed by our corpus study, to adapt the dialogue model for Embodied conversational agents (ECAs) to a specific cultural model. ECAs can be regarded as a special case of multimodal dynamic interaction systems. They support the idea that humans prefer to interact with an artefact that possesses some human-like qualities. In the media equation [15] the authors state that people respond to computers as if they were humans. Thus people might also build up social relationships with virtual agents. To enhance the believability of those agents they could be extended with cultural background. Following Hofstede [8] human behavior is dependent from human nature, culture and personality. Although cultural background plays an important role in human interaction and virtual agents communicate with the user in a natural way, so far little effort has been made to integrate cultural differences into technical systems. We believe that by realising culture specific dialogue management styles for ECAs, their believability could be enhanced. As the usage of pauses in speech is one important aspect in dialogue management we want to have a closer look at their occurrences to answer the following questions. How often and when do pauses take place? How long do they last? Who breaks the silence? What kind of speech acts are followed by pauses and which utterances are used for start ups? As a starting point we concentrate on