Investigating Mutual Gaze Frank Broz, Hatice Kose-Bagci, Chrystopher L. Nehaniv, Kerstin Dautenhahn School of Computer Science University of Hertfordshire, UK {f.broz, h.kose-bagci, c.l.nehaniv, k.dautenhahn}@herts.ac.uk Abstract The role of gaze in interaction has been an area of increasing interest to the ﬁeld of human-robot interaction. Mutual gaze, the pattern of behavior that arises when humans look directly at each other’s faces, sends im- portant social cues and is a developmental precursor in humans to joint attention and language learning. In preparation for learn- ing a computational model of mutual gaze, data from human-human pairs in a conversa- tional task was collected using a gaze-tracking system and face detection algorithm. The results presented show the potential of this automated method. Future applications for interaction-based robot language learning are discussed. 1. Introduction Mutual gaze is an ongoing process between two in- teractors jointly regulating their amount of eye con- tact rather than an atomic act by a single per- son (Argyle, 1988). This important social phe- nomenon is also one that becomes signiﬁcant at an early developmental stage; even young infants are responsive to being the object of a caretaker’s gaze (Hains and Muir, 1996). Mutual gaze behav- ior is the basis of and developmental precursor to more complex gaze behaviors such as visual joint attention (Farroni, 2003). It is also a component of turn-taking ”proto-conversations” between infants and caretakers that set the stage for language learn- ing (Trevarthen and Aitken, 2001). Mutual gaze is known to play a role in regulating conversational turn-taking in adults (Kleinke, 1986). There is ev- idence that children in the earliest stages of lan- guage acquisition also coordinate their gaze pat- terns with conversational turns, shifting towards an adult-like pattern as they gain more language skills (D’Odorico et al., 1997). Recently, the ﬁeld of human-robot interaction has become increasingly interested in the role of gaze in a variety of conversational tasks, and robots have been programmed to produce natural-appearing mutual gaze behavior. But in these existing systems, robots either respond to human gaze but do not take any action to regulate the duration of mutual gaze itself (e.g., (Yoshikawa et al., 2006)), or they produce be- havior based on a model with realistic timings that is not responsive to real-time gaze information (e.g., (Mutlu et al., 2006)). For a robot to successfully ne- gotiate humanlike mutual gaze, it must both be re- sponsive to the human’s immediate gaze behavior and possess an internal model of mutual gaze based on time and other signiﬁcant factors. Robotic sys- tems designed to learn language through interaction by exploiting the structure of child-directed speech such as (Saunders et al., 2010) could especially ben- eﬁt from a gaze model that supports social engage- ment. A promising way of building such models is by using data collected from human-human pairs. 2. Experiment The purpose of this research is two-fold. One goal is to collect data for the design of a gaze controller for a robot which will be capable of producing socially appropriate mutual gaze behavior. The other is to verify and further investigate human-human mutual gaze behavior. Studies of mutual gaze in the psy- chology literature have been conducted using human observation to encode the gaze data. If we hope to build on these ﬁndings to produce gaze behavior for robots, it would be useful to ﬁrst conﬁrm that we can duplicate them in data that was collected using the automated gaze detection methods necessary for robot control. In light of these goals, an exploratory study was conducted involving a face-to-face conversational task between pairs of participants. In order to record their gaze direction and the location of their faces, each participant wore a head-mounted gaze track- ing system and face detection software (based on the openCV library) was applied to each tracker’s scene camera to locate the other participant’s face in their visual ﬁeld . Seven pairs of people participated in the exper- iment, recruited from the staﬀ and student popu- lation of the University of Hertfordshire. Because the level of familiarity between interactors has been show to have an impact on the level of eye contact, the pairs were all workplace acquaintances. During the experiment, two participants were seated face to face, approximately six feet apart, with a desk be- tween them (this distance was selected so that a di- rect comparison could be made with existing results). They engaged in an unconstrained conversation for ten minutes. The participants were instructed to speak about emotionally neutral topics (nothing very personal or potentially upsetting). During the con- versation, data was recorded for three trials of a du- ration of eighty seconds each. 3. Results Two pairs had their data excluded from analysis be- cause of obvious face-tracking errors (where the face