361
Increasing the Expressiveness of Virtual Agents–
Autonomous Generation of Speech and Gesture
for Spatial Description Tasks
Kirsten Bergmann
Sociable Agents Group, CITEC
Bielefeld University
P.O. 100 131, D-33615 Bielefeld
kbergman@techfak.uni-bielefeld.de
Stefan Kopp
Sociable Agents Group, CITEC
Bielefeld University
P.O. 100 131, D-33615 Bielefeld
skopp@techfak.uni-bielefeld.de
ABSTRACT
Embodied conversational agents are required to be able to
express themselves convincingly and autonomously. Based
on an empirial study on spatial descriptions of landmarks
in direction-giving, we present a model that allows virtual
agents to automatically generate, i.e., select the content and
derive the form of coordinated language and iconic gestures.
Our model simulates the interplay between these two modes
of expressiveness on two levels. First, two kinds of knowledge
representation (propositional and imagistic) are utilized to
capture the modality-specific contents and processes of con-
tent planning. Second, specific planners are integrated to
carry out the formulation of concrete verbal and gestural
behavior. A probabilistic approach to gesture formulation
is presented that incorporates multiple contextual factors as
well as idiosyncratic patterns in the mapping of visuo-spatial
referent properties onto gesture morphology. Results from a
prototype implementation are described.
Categories and Subject Descriptors
I.2.0 [Artificial Intelligence]: General—Cognitive Simula-
tion ; I.2.1 [Artificial Intelligence]: Applications and Ex-
pert Systems—Natural Language Interfaces ; I.2.11 [Artificial
Intelligence]: Distributed Artificial Intelligence—Intelli-
gent Agents ; D.2.2 [Software Engineering]: Design Tools
and Techniques—User Interfaces
General Terms
Design, Experimentation, Theory
Keywords
Gesture, language, expressiveness, multimodal output, em-
bodied conversational agents
1. INTRODUCTION
One key issue in the endowment of virtual agents with
human-like expressiveness, i.e., richness and versatility, is
the autonomous generation of language and accompanying
Cite as: Title, Author(s), Proc. of 8th Int. Conf. on Au-
tonomous Agents and Multiagent Systems (AAMAS 2009),
Decker, Sichman, Sierra and Castelfranchi (eds.), May, 10–15, 2009, Bu-
dapest, Hungary, pp. XXX-XXX.
Copyright c 2009, International Foundation for Autonomous Agents and
Multiagent Systems (www.ifaamas.org). All rights reserved.
gestures. Current literature on gesture research states that
the question“why different gestures take the particular phys-
ical form they do is one of the most important yet largely
unaddressed questions in gesture research” [1, p. 499]. This
holds especially for iconic gestures, for which information
has to be mapped from some mental image into (at least
partly) resembling gestural form. This transformation is
neither direct nor straightforward but involves a number of
issues like the composability of a suitable linguistic context,
the choice of gestural representation technique (e.g., plac-
ing, drawing etc.), or the low-level choices of morphological
features such as handshape or movement trajectory.
In this paper, we present a novel approach to generat-
ing coordinated speech and iconic gestures in virtual agents.
It comprises an architecture that simulates the interplay
between these two modes of expressiveness on two levels.
First, two kinds of knowledge representation–propositional
and imagistic–are utilized to capture the modality-specific
contents and processes of content planning (i.e., what to
convey). Second, specific planners are integrated to carry
out the formulation of concrete verbal and gestural behavior
(i.e., how to convey it best). The overall interplay of these
modules is modeled as a multi-agent cooperation process in
order to meet the low latency and realtime requirements
that hold for behavior generation in interactive agents.
In the following, we will put special focus on the above-
described puzzle of gesture formulation. After discussing
related work in the following section, we report in Section
3 on an empirical study on spontaneous speech and gesture
use in VR direction-giving. Its results indicate that a model
for autonomous generation of expressive gestures must take
into account both, inter -personal commonalities in terms of
contextual factors constraining the involved decisions [11],
and intra -personal systematics as apparent in idiosyncratic
gesture patterns. We thus present in Section 4, after intro-
ducing the overall architecture of our framework, a simu-
lation account going beyond recent systems that either de-
rive iconic gestures from systematic meaning-form mappings
(e.g. [12]), or model the individual gesturing patterns of spe-
cific speakers (e.g., [16]). Based on data from our annotated
corpus, we employ machine learning and adaptation algo-
rithms to build Bayesian networks that allow to model both
kinds of constraining factors. Finally, Section 5 presents re-
sults from an application of an implementation of our model
in a spatial-description task domain.
Cite as: Increasing the Expressiveness of Virtual Agents– Autonomous
Generation of Speech and Gesture for Spatial Description Tasks, Kirsten
Bergmann, Stefan Kopp, Proc. of 8th Int. Conf. on Autonomous Agents
and Multiagent Systems (AAMAS 2009), Decker, Sichman, Sierra and
Castelfranchi (eds.), May, 10–15, 2009, Budapest, Hungary, pp. 361–368
Copyright © 2009, International Foundation for Autonomous Agents
and Multiagent Systems (www.ifaamas.org), All rights reserved.