Multimodal Interaction Analysis in a Smart House Pilar Manchón University of Seville pmanchon@us.es Carmen del Solar University of Seville carsolval@alum.us.es Gabriel Amores University of Seville jgabriel@us.es Guillermo Pérez University of Seville gperez@us.es ABSTRACT This is a large extension to a previous paper presented in LREC 2006 [6]. It describes the motivation, collection and format of the MIMUS corpus, as well as an in-depth and issue-focused analysis of the data. MIMUS [8] is the result of multimodal WoZ experiments conducted at the University of Seville as part of the TALK project. The main objective of the MIMUS corpus was to gather information about different users and their performance, preferences and usage of a multimodal multilingual natural dialogue system in the Smart Home scenario in Spanish. The focus group is composed by wheel-chair-bound users, because of their special motivation to use this kind of technology, along with their specific needs. Throughout this article, the WoZ platform, experiments, methodology, annotation schemes and tools, and all relevant data will be discussed, as well as the results of the in- depth analysis of these data. The corpus compresses a set of three related experiments. Due to the limited scope of this article, only some results related to the first two experiments (1A and 1B) will be discussed. This article will focus on subject’s preferences, multimodal behavioural patterns and willingness to use this kind of technology. Categories and Subject Descriptors H5.m. Information interfaces and presentation (e.g., HCI): Miscellaneous. General Terms Design, Experimentation, Human Factors, Keywords Multimodal Corpus, HCI, Multimodal Experiments, Multimodal Entries, Multimodal Interaction; Mixed-modality events 1. INTRODUCTION MIMUS was collected with the ultimate goal of gaining sufficient information to re-design and configure the MIMUS multimodal dialogue system. The original speech-only system (Delfos) was developed throughout previous European projects. The MIMUS corpus was used to design the new system extensions and define its configuration and overall behavior. Although there is extensive research in this area [8][9][10][11], most of it has taken place in languages other than Spanish, often in English. This new corpus will provide additional information for Spanish speakers and the Smart House domain. Firstly, the WOZ platform will be briefly described. Then, the full set of experiments as well as the motivation behind them will be discussed. Data, annotation tools and methodology, inter- annotators agreement and final format will then be described and justified. Following this supporting information, data analysis from experiments 1A and 1B will be dissected, presented and interpreted. Experiment 2 will not be described in this paper. Finally, some conclusions will be drawn and future research areas will be proposed. 2. THE MIMUS WOZ PLATFORM The platform is based on Delfos, the spoken dialogue system developed at the University of Seville. All the original spoken functionality from Delfos as well as the new multimodal additions are available in MIMUS. In terms of hardware, the platform consists of a PC (used by the wizard), a tablet PC (used by the subject), a Wifi router connecting both PCs, and a set of real home devices placed as presented in the graphical interface. These were real working devices (lamps, radio, blind) that would turn on/off or open/close as the subject instructed the system to do so. Additionally, several wizard and subject software agents have been developed. The wizard agents set consists of: Figure 1: The subject’s touchscreen display 1. A Wizard Helper: a control panel that enables the wizard to “talk” to the user (synthesizer) and remotely play audio and video files. 2. A Device Manager, which enables the wizard to control the home devices and to see what the subject is clicking on. Subjects could use the pen to click/tap on the screen and/or talk through the microphone. All tasks could be fully performed using either speech or the graphical interface, as well as using them both in combination (multimodally). Subjects chose what to do. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICMI’07, November 12–15, 2007, Nagoya, Aichi, Japan. Copyright 2007 ACM 978-1-59593-817-6/07/0011 $5 00