Generating Robot Gesture Using a Virtual Agent Framework Maha Salem, Stefan Kopp, Ipke Wachsmuth, Frank Joublin Abstract— One of the crucial aspects in building sociable, communicative robots is to endow them with expressive non- verbal behaviors. Gesture is one such behavior, frequently used by human speakers to illustrate what they express in speech. The production of gestures, however, poses a number of challenges with regard to motor control for arbitrary, expressive hand-arm movement and its coordination with other interaction modalities. We describe an approach to enable the humanoid robot ASIMO to ﬂexibly produce communicative gestures at run-time, building upon the Articulated Communicator Engine (ACE) that was developed to allow virtual agents to realize planned behavior representations on the spot. We present a control architecture that tightly couples ACE with ASIMO’s perceptuo-motor system for multi-modal scheduling. In this way, we combine conceptual representation and planning with motor control primitives for meaningful arm movements of a physical robot body. First results of realized gesture represen- tations are presented and discussed. I. INTRODUCTION Lifelike acting in a social robot evokes social commu- nicative attributions to the robot and thereby conveys in- tentionality. That is, the robot makes the human interaction partner believe that it has, e.g., internal states, communicative intent, beliefs and desires [4]. To induce such beliefs, a robot companion should produce social cues. Forming an integral part of human communication, hand and arm gestures are primary candidates for extending the communicative capa- bilities of social robots. Non-verbal expression via gesture is frequently used by human speakers to emphasize, supplement or even complement what they express in speech. Pointing to objects or giving spatial direction are good examples of how information can be conveyed in this manner. This additional expressiveness is an important feature of social interaction to which humans are known to be well attentive. Similarly, humanoid robots that are intended to engage in natural and ﬂuent human-robot interaction should produce communicative gestures for comprehensible and believable behavior. In contrast to task-oriented movements like reaching or grasping, human gestures are derived to a certain extent from some kind of internal representation of ‘shape’ [11], especially when iconic or metaphoric gestures are used. Such characteristic shape and dynamical properties exhibited by gestural movement allow humans to distinguish them from M. Salem is at the Research Institute for Cognition and Robotics, Bielefeld, Germany msalem@cor-lab.uni-bielefeld.de S. Kopp is at the Sociable Agents Group, Bielefeld University, Germany skopp@techfak.uni-bielefeld.de I. Wachsmuth is at the Artiﬁcial Intelligence Group, Bielefeld University, Germany ipke@techfak.uni-bielefeld.de F. Joublin is at the Honda Research Institute Europe, Offenbach, Germany frank.joublin@honda-ri.de subsidiary movements and to recognize them as meaningful non-verbal behavior [24]. As a consequence, the generation of gestures for artiﬁcial humanoid bodies demands a high degree of control and ﬂexibility concerning shape and time properties of the gesture, while ensuring a natural appearance of the movement. Ideally, if such non-verbal behaviors are to be realized, they have to be derived from conceptual, to- be-communicated information. The present paper focuses on the implementation of com- municative gestures which have to meet the aforementioned constraints. The overall objective of this research is to enable a physical robot to ﬂexibly produce speech and co-verbal gesture at run-time and to subsequently evaluate the resulting communicative behavior in human-robot interaction studies. For this, we explore how we can transfer existing concepts from the domain of virtual conversational agents to the platform of a humanoid robot. In [21], we address the production of speech as a further output modality and its synchronization with gesture. A future aspect of this work will incorporate an evaluation of the generated multi-modal robot behavior. II. RELATED WORK Up to now, the generation together with the evaluation of the effects of robot gesture is largely unexplored. In tradi- tional robotics, recognition rather than synthesis of gesture has mainly been brought into focus. In the few existing cases of gesture synthesis, however, models typically denote object manipulation fulﬁlling little or no communicative function, e.g. [2]. Furthermore, gesture generation is often based on the recognition of previously perceived gestures, thereby fo- cusing on imitation learning, e.g. [1]. In most cases in which robot gesture is actually generated with a communicative intent, these arm movements are not produced at run-time, but are pre-recorded for demonstration purposes, e.g. [23] and [7]. Crucially, many approaches are realized on less sophis- ticated platforms with less complex robot bodies (e.g., less degrees of freedom (DOF), limited mobility, etc.) that show no or only few humanoid traits. However, it is not only the behavior but also the appearance of a robot that affects the way human-robot interaction is experienced [19]. Con- sequently, the importance of the robot’s design should not be underestimated if the intention is to ultimately use it as a research platform, e.g., to study the effect of robot ges- ture on humans. MacDorman and Ishiguro consider android robots a key testing ground for social, cognitive, and neuro- scientiﬁc theories, providing an experimental apparatus that can be controlled more precisely than any human actor [16]. The 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems October 18-22, 2010, Taipei, Taiwan 978-1-4244-6676-4/10/$25.00 ©2010 IEEE 3592