A Wizard-of-Oz Experiment for Tutorial Dialogues in Mathematics Christoph BENZM ¨ ULLER , Armin FIEDLER , Malte GABSDIL , Helmut HORACEK , Ivana KRUIJFF-KORBAYOV ´ A , Manfred PINKAL , J¨ org SIEKMANN , Dimitra TSOVALTZI , Bao Quoc VO , Magdalena WOLSKA Department of Computer Science Department of Computational Linguistics Saarland University, P.O. Box 15 11 50 D-66041 Saarbr ¨ ucken, Germany Abstract. In this paper we report on a Wizard-of-Oz (WOz) experiment which was conducted in order to collect written empirical data on mathematics tuto- rial dialogues in German. We present a methodological approach for opti- mising the gains from WOz empirical studies. We show the results of this approach from our empirical study. 1 Introduction In a Wizard-of-Oz (WOz) experiment, the subject interacts through an interface with a human “wizard” simulating the behaviour of a system [1]. The WOz methodology is commonly used to investigate human-computer interaction in systems under development. In this paper we report on a WOz experiment which was conducted in the framework of the DIALOG project [2] in order to collect empirical data on mathematics tutorial dialogues in German. More specifically, our goal was to collect data on (1) the tutoring process, (2) the students’ answers, (3) the dialogue behaviour, and (4) the use of natural language. The reason for using the WOz methodology is that we can formalise the model which we want to implement in our system and ask the wizard to follow it. This way (i) dialogue data which represents the users’ behaviour in interactions following the specific model can be collected and (ii) an early feedback on the model is provided. In subsequent experiments in the project, implemented components can be substituted for some of the tasks now carried out by the wizard, while preserving the overall experimental setup. In the DIALOG project, we aim at a mathematical tutoring dialogue system that employs an elaborate natural language dialogue component. Our motivation stems from empirical evi- dence that natural language dialogue capabilities are necessary for the success of tutoring [3]. Moreover, to model mathematics tutorial dialogues, we need a formally encoded mathemat- ical theory, means of evaluating the student’s input in terms of the knowledge of the domain demonstrated, and a theory of tutoring. In this paper, we first show how the preparations for our experiment addressed these issues and we look at the formalisations which enable us to specify the goals of our experiment more robustly in Section 2. Then, in Section 3, we present the actual experiment design which made use of the preparation. Next, we discuss our approach in Section 4. In Section 5 we look into some related work and then conclude the paper. 2 Methodology and Formalisations Our general motivation was to formalise the different areas of interest in order to restrict the collection of the data. The formalisations secure a consistent behaviour from the wizards. It thus becomes possible to evaluate this behaviour in general, and to extract robust qualitative information from the data on how to improve it.