MushyPeek – an experiment framework for controlled investigation of human-human interaction control behaviour Jens Edlund, Jonas Beskow & Mattias Heldner KTH Centre for Speech Technology Abstract This paper describes MushyPeek, a experiment framework that allows us to manipulate interaction control behaviour – including turn-taking – in a setting quite similar to face-to-face human-human dialogue. The setup connects two subjects to each other over a VoIP telephone connection and simultaneuously provides each of them with an avatar representing the other. The framework is exemplified with the first experiment we tried in it – a test of the effectiveness interaction control gestures in an animated lip-synchronised talking head. Introduction People take a great number of things into con- sideration in order to manage the flow of the interaction when conversing face-to-face. We call this interaction control – the term is wider than turn-taking and does not presuppose the existence of "turns". Examples of features that play a part in interaction control include audi- tory cues such as pitch, intensity, pause and dis- fluency, hyperarticulation, etc.; visual cues such as gaze, facial expressions, gestures, and mouth movements; and cues like pragmatic, semantic and syntactic completeness. People commonly use these cues in combination and seem to mix them or shift between them seamlessly. In order fully understand human interaction control, we need to know how these features work in combination. In order to reach that goal, however, it seems fair to first get a handle on how the cues are used and perceived on their own. We have previously tested the perception and the production of a number of such cues in vari- ous user experiments, often involving users talking to different configurations of a spoken dialogue system where the interaction behaviour can be varied in a controlled manner (e.g. Edlund & Nordstrand, 2002, Bell et al., 2001), but also by analysing human-machine as well as human-human dialogues (Edlund & Heldner, 2005, Edlund et al., 2005). This paper describes MushyPeek, a different ex- perimental design that allows us to investigate interaction control behaviour by manipulating certain parameters in somehing quite similar to face-to-face human-human dialogue. Similar methods have been used by others; we were es- pecially inspired by Gratch et al. (2006). In this experimental setup, two subjects are con- nected to each other over a VoIP telephone con- nection. Furthermore, each speaker sees an ava- tar, a visual representation of the other speaker, at all times. (We use avatar to denote a virtual representation of a human, whereas a virtual representation of a system as a creature would be an ECA – an embodied conversational agent in our terminology. For the avatars, we use SynFace lip synchronised talking heads (Beskow et al., 2004). Finally, the setup consists of a simple model of multi-party interaction which functions as the control mechanism for the manipulation of the interaction, and of a number of logging mechanisms. In the follow- ing, we will present this experimental setup and include a brief report of the first experiment we’ve undertaken in it as an illustration. The MushyPeek framework In order to better be able to investigate people’s turn-taking behaviour, we have designed an ex- periment framework in which two interlocutors speaks freely. The participants are placed in separate rooms, and each participant is equipped with a head-set connected to a Voice-over-IP call. Currently, we use Skype (http://www.skype.com/) for this. On both sides, the call is enhanced with SynFace – a lip syn- chronised animated talking head functioning as an avatar, representing each participant. These