From Question Answering to Spoken Dialogue: Towards an Information Search Assistant for Interactive Multimodal Information Extraction Rieks op den Akker, Harry Bunt, Simon Keizer, Boris van Schooten Faculty of Arts, Tilburg University, Tilburg, Netherlands Faculty EWI, Universiteit Twente, Enschede, Netherlands infrieks@cs.utwente.nl, harry.bunt@uvt.nl, s.keizer@uvt.nl, schooten@cs.utwente.nl Abstract This paper gives an overview of issues related to extend- ing simple question answering (QA) with dialogue capabilities, when designing a multimodal interactive information extraction system for a large, though restricted, domain. We present the way in which these issues are approached in the IMIX program. The IMIX demonstrator system, under de- velopment in this program, may be considered the most difficult case of QA, answering non-factoid questions in a large domain, and accepting speech input as well. We describe our approach to the addition of dialogue capabilities to this system. We will look at QA from a dialogue system perspective and from a HCI perspective, and consider the consequences of our choice of interaction metaphor, the ‘information search assis- tant’. 1. Introduction This paper presents ongoing research within the framework of the IMIX (Interactive Multimodal Information Extraction) pro- gram, a Dutch national multiproject research effort concerning Dutch language and speech technology. IMIX brings together academic partners 1 and partners from industry 2 to collaborate on research involving question answering, speech recognition, speech and language generation, automatic ontology genera- tion, and dialogue management. The collaboration involves both fundamental research and the development of a demonstra- tor system, in which the various technologies are combined into an interactive multimodal information extraction system for the domain of medical encyclopedic information. The IMIX program now includes two projects concerned with dialogue management: VIDIAM (Dialogue Management and the Visual Channel) and PARADIME (Parallel Agent-based Dialogue Management Engine), that will cooperate in develop- ing a dialogue manager for the demonstrator system. Currently, a baseline system has been developed, provid- ing question answering (QA) functionality without dialogue. The user can ask isolated, self-contained questions about the medical domain, using either speech or keyboard. The system answers with a matching document fragment in the document 1 The universities of Groningen, Tilburg, Twente, Nijmegen and Am- sterdam. 2 Textkernel, providing support software, and Het Spectrum Publish- ers and MerckManual, providing electronic documents containing med- ical encyclopedic information. database, which may be both spoken and displayed as text. In- tegrating a dialogue manager into the system will provide for assistance of users in finding information through an interactive and cooperative process. In the mixed-initiative dialogues that will be supported, users will additionally be able to ask the sys- tem for clarification, make corrections to the system’s interpre- tation of utterances and ask follow-up questions, and the system may additionally ask the user for clarification and ask verifica- tion questions. In addition to dialogue functionality, IMIX will also present pictures as part of the answers, and will enable the user to point to both the pictures and text on the screen as part of the dialogue. To make clear what dialogue functionality the IMIX demonstrator will have, we choose the metaphor of an ‘Infor- mation Search Assistant’, which means that the system should have interactive capabilities in common with an assistant librar- ian who can provide help in finding answers to questions by identifying relevant documents and parts of documents. The paper is organised as follows: in Section 2, we will discuss the dialogue capabilities that the IMIX demonstrator should have, taking into account (1) the requirements deter- mined by the IMIX program as a whole, and the use of the Information Search Assistant in particular, and (2) the required dialogue functionality from a HCI perspective. In Section 3 we outline the approach to dialogue management that will be taken in the IMIX project, evolving from the requirements on the functionality of the system’s dialogue manager discussed in Section 2, and we indicate a partial system architecture to im- plement this functionality. 2. Dialogue Functionality for QA The IMIX demonstrator is intended for casual users, i.e., users who have no professional knowledge of the medical domain, who use the system only occasionally, and who have not re- ceived any special training in using the system. The addition of dialogue capabilities to a bare QA system is attractive for several reasons. First, it is well known that users are often unable to express their need for information in a single, self-contained question. This is especially so for casual users (as opposed to frequent, professional users) and complex infor- mation domains. Such users typically do not know precisely what can be asked, since they have no detailed knowledge of the information that is available. Moreover, they often have a desire for information which is not very articulate, especially when the information domain is relatively unknown. These cir- cumstances make it desirable for the user to be able to not just INTERSPEECH 2005 2793 September, 4-8, Lisbon, Portugal