A Multimodal Discourse Ontology for Meeting Understanding John Niekrasz and Matthew Purver Center for the Study of Language and Information, Stanford University, Cordura Hall, 210 Panama St., Stanford, CA 94305-4115, USA {niekrasz, mpurver}@csli.stanford.edu Abstract. In this paper, we present a multimodal discourse ontology that serves as a knowledge representation and annotation framework for the discourse understanding component of an artificial personal office assistant. The ontology models components of natural language, mul- timodal communication, multi-party dialogue structure, meeting struc- ture, and the physical and temporal aspects of human communication. We compare our models to those from the research literature and from similar applications. We also highlight some annotations which have been made in conformance with the ontology as well as some algorithms which have been trained on these data and suggest elements of the ontology that may be of immediate interest for further annotation by human or automated means. 1 Introduction People can communicate with great efficiency and expressiveness during natu- ral interaction with others. This is perhaps the greatest reason that face-to-face conversations remain such a significant part of our working lives despite the nu- merous technologies available that allow communication by other means. Never- theless, businesses spend millions of dollars each year conducting meetings that are often seen as highly inefficient [1], and there is great interest in research- ing these interactions to better understand them, create technology to facilitate them, and assist in the recording and dissemination of their content. To do this in a manner that is truly useful to organizations and desirable to individuals, automated “meeting understanding” should encompass not only the annotation of video and audio for playback, but the extraction of relevant infor- mation at the level of semantics and pragmatics: what subjects were discussed, what decisions were made, and what tasks were assigned [2]. Because natural multi-party interactions are vastly complex, and because this information we wish to extract is equally complex, of many different types, and expressed in many different modalities, a meeting understanding system must have an inte- grated and expressive model of meetings, discourse, and language supporting it to effectively manage its knowledge. For our meeting understanding system, a component of the Cognitive Assis- tant that Learns and Organizes (CALO), knowledge integration and expression S. Renals and S. Bengio (Eds.): MLMI 2005, LNCS 3869, pp. 162–173, 2006. c Springer-Verlag Berlin Heidelberg 2006