361 Increasing the Expressiveness of Virtual Agents– Autonomous Generation of Speech and Gesture for Spatial Description Tasks Kirsten Bergmann Sociable Agents Group, CITEC Bielefeld University P.O. 100 131, D-33615 Bielefeld kbergman@techfak.uni-bielefeld.de Stefan Kopp Sociable Agents Group, CITEC Bielefeld University P.O. 100 131, D-33615 Bielefeld skopp@techfak.uni-bielefeld.de ABSTRACT Embodied conversational agents are required to be able to express themselves convincingly and autonomously. Based on an empirial study on spatial descriptions of landmarks in direction-giving, we present a model that allows virtual agents to automatically generate, i.e., select the content and derive the form of coordinated language and iconic gestures. Our model simulates the interplay between these two modes of expressiveness on two levels. First, two kinds of knowledge representation (propositional and imagistic) are utilized to capture the modality-specific contents and processes of con- tent planning. Second, specific planners are integrated to carry out the formulation of concrete verbal and gestural behavior. A probabilistic approach to gesture formulation is presented that incorporates multiple contextual factors as well as idiosyncratic patterns in the mapping of visuo-spatial referent properties onto gesture morphology. Results from a prototype implementation are described. Categories and Subject Descriptors I.2.0 [Artificial Intelligence]: General—Cognitive Simula- tion ; I.2.1 [Artificial Intelligence]: Applications and Ex- pert Systems—Natural Language Interfaces ; I.2.11 [Artificial Intelligence]: Distributed Artificial Intelligence—Intelli- gent Agents ; D.2.2 [Software Engineering]: Design Tools and Techniques—User Interfaces General Terms Design, Experimentation, Theory Keywords Gesture, language, expressiveness, multimodal output, em- bodied conversational agents 1. INTRODUCTION One key issue in the endowment of virtual agents with human-like expressiveness, i.e., richness and versatility, is the autonomous generation of language and accompanying Cite as: Title, Author(s), Proc. of 8th Int. Conf. on Au- tonomous Agents and Multiagent Systems (AAMAS 2009), Decker, Sichman, Sierra and Castelfranchi (eds.), May, 10–15, 2009, Bu- dapest, Hungary, pp. XXX-XXX. Copyright c 2009, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved. gestures. Current literature on gesture research states that the question“why different gestures take the particular phys- ical form they do is one of the most important yet largely unaddressed questions in gesture research” [1, p. 499]. This holds especially for iconic gestures, for which information has to be mapped from some mental image into (at least partly) resembling gestural form. This transformation is neither direct nor straightforward but involves a number of issues like the composability of a suitable linguistic context, the choice of gestural representation technique (e.g., plac- ing, drawing etc.), or the low-level choices of morphological features such as handshape or movement trajectory. In this paper, we present a novel approach to generat- ing coordinated speech and iconic gestures in virtual agents. It comprises an architecture that simulates the interplay between these two modes of expressiveness on two levels. First, two kinds of knowledge representation–propositional and imagistic–are utilized to capture the modality-specific contents and processes of content planning (i.e., what to convey). Second, specific planners are integrated to carry out the formulation of concrete verbal and gestural behavior (i.e., how to convey it best). The overall interplay of these modules is modeled as a multi-agent cooperation process in order to meet the low latency and realtime requirements that hold for behavior generation in interactive agents. In the following, we will put special focus on the above- described puzzle of gesture formulation. After discussing related work in the following section, we report in Section 3 on an empirical study on spontaneous speech and gesture use in VR direction-giving. Its results indicate that a model for autonomous generation of expressive gestures must take into account both, inter -personal commonalities in terms of contextual factors constraining the involved decisions [11], and intra -personal systematics as apparent in idiosyncratic gesture patterns. We thus present in Section 4, after intro- ducing the overall architecture of our framework, a simu- lation account going beyond recent systems that either de- rive iconic gestures from systematic meaning-form mappings (e.g. [12]), or model the individual gesturing patterns of spe- cific speakers (e.g., [16]). Based on data from our annotated corpus, we employ machine learning and adaptation algo- rithms to build Bayesian networks that allow to model both kinds of constraining factors. Finally, Section 5 presents re- sults from an application of an implementation of our model in a spatial-description task domain. Cite as: Increasing the Expressiveness of Virtual Agents– Autonomous Generation of Speech and Gesture for Spatial Description Tasks, Kirsten Bergmann, Stefan Kopp, Proc. of 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009), Decker, Sichman, Sierra and Castelfranchi (eds.), May, 10–15, 2009, Budapest, Hungary, pp. 361–368 Copyright © 2009, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org), All rights reserved.