Generating Robot Gesture Using a Virtual Agent Framework
Maha Salem, Stefan Kopp, Ipke Wachsmuth, Frank Joublin
Abstract— One of the crucial aspects in building sociable,
communicative robots is to endow them with expressive non-
verbal behaviors. Gesture is one such behavior, frequently
used by human speakers to illustrate what they express in
speech. The production of gestures, however, poses a number of
challenges with regard to motor control for arbitrary, expressive
hand-arm movement and its coordination with other interaction
modalities. We describe an approach to enable the humanoid
robot ASIMO to flexibly produce communicative gestures at
run-time, building upon the Articulated Communicator Engine
(ACE) that was developed to allow virtual agents to realize
planned behavior representations on the spot. We present a
control architecture that tightly couples ACE with ASIMO’s
perceptuo-motor system for multi-modal scheduling. In this
way, we combine conceptual representation and planning with
motor control primitives for meaningful arm movements of a
physical robot body. First results of realized gesture represen-
tations are presented and discussed.
I. INTRODUCTION
Lifelike acting in a social robot evokes social commu-
nicative attributions to the robot and thereby conveys in-
tentionality. That is, the robot makes the human interaction
partner believe that it has, e.g., internal states, communicative
intent, beliefs and desires [4]. To induce such beliefs, a robot
companion should produce social cues. Forming an integral
part of human communication, hand and arm gestures are
primary candidates for extending the communicative capa-
bilities of social robots. Non-verbal expression via gesture is
frequently used by human speakers to emphasize, supplement
or even complement what they express in speech. Pointing
to objects or giving spatial direction are good examples
of how information can be conveyed in this manner. This
additional expressiveness is an important feature of social
interaction to which humans are known to be well attentive.
Similarly, humanoid robots that are intended to engage in
natural and fluent human-robot interaction should produce
communicative gestures for comprehensible and believable
behavior.
In contrast to task-oriented movements like reaching or
grasping, human gestures are derived to a certain extent
from some kind of internal representation of ‘shape’ [11],
especially when iconic or metaphoric gestures are used. Such
characteristic shape and dynamical properties exhibited by
gestural movement allow humans to distinguish them from
M. Salem is at the Research Institute for Cognition and Robotics,
Bielefeld, Germany msalem@cor-lab.uni-bielefeld.de
S. Kopp is at the Sociable Agents Group, Bielefeld University, Germany
skopp@techfak.uni-bielefeld.de
I. Wachsmuth is at the Artificial Intelligence Group, Bielefeld University,
Germany ipke@techfak.uni-bielefeld.de
F. Joublin is at the Honda Research Institute Europe, Offenbach, Germany
frank.joublin@honda-ri.de
subsidiary movements and to recognize them as meaningful
non-verbal behavior [24]. As a consequence, the generation
of gestures for artificial humanoid bodies demands a high
degree of control and flexibility concerning shape and time
properties of the gesture, while ensuring a natural appearance
of the movement. Ideally, if such non-verbal behaviors are
to be realized, they have to be derived from conceptual, to-
be-communicated information.
The present paper focuses on the implementation of com-
municative gestures which have to meet the aforementioned
constraints. The overall objective of this research is to enable
a physical robot to flexibly produce speech and co-verbal
gesture at run-time and to subsequently evaluate the resulting
communicative behavior in human-robot interaction studies.
For this, we explore how we can transfer existing concepts
from the domain of virtual conversational agents to the
platform of a humanoid robot. In [21], we address the
production of speech as a further output modality and its
synchronization with gesture. A future aspect of this work
will incorporate an evaluation of the generated multi-modal
robot behavior.
II. RELATED WORK
Up to now, the generation together with the evaluation of
the effects of robot gesture is largely unexplored. In tradi-
tional robotics, recognition rather than synthesis of gesture
has mainly been brought into focus. In the few existing cases
of gesture synthesis, however, models typically denote object
manipulation fulfilling little or no communicative function,
e.g. [2]. Furthermore, gesture generation is often based on
the recognition of previously perceived gestures, thereby fo-
cusing on imitation learning, e.g. [1]. In most cases in which
robot gesture is actually generated with a communicative
intent, these arm movements are not produced at run-time,
but are pre-recorded for demonstration purposes, e.g. [23]
and [7].
Crucially, many approaches are realized on less sophis-
ticated platforms with less complex robot bodies (e.g., less
degrees of freedom (DOF), limited mobility, etc.) that show
no or only few humanoid traits. However, it is not only
the behavior but also the appearance of a robot that affects
the way human-robot interaction is experienced [19]. Con-
sequently, the importance of the robot’s design should not
be underestimated if the intention is to ultimately use it as
a research platform, e.g., to study the effect of robot ges-
ture on humans. MacDorman and Ishiguro consider android
robots a key testing ground for social, cognitive, and neuro-
scientific theories, providing an experimental apparatus that
can be controlled more precisely than any human actor [16].
The 2010 IEEE/RSJ International Conference on
Intelligent Robots and Systems
October 18-22, 2010, Taipei, Taiwan
978-1-4244-6676-4/10/$25.00 ©2010 IEEE 3592