Evaluation of a Spoken Dialogue System with Usability Tests and Long-term Pilot Studies: Similarities and Differences Markku Turunen, Jaakko Hakulinen and Anssi Kainulainen Speech-based and Pervasive Interaction group, TAUCHI, Department of Computer Sciences University of Tampere, Tampere, Finland {Markku.Turunen,Jaakko.Hakulinen,Anssi.Kainulainen}@cs.uta.fi Abstract We present findings from the long-term study of a speech-based bus timetable system. After the deployment of the prototype system we have collected data from real usage for 30 months. In addition, we have conducted usability tests to get subjective ratings of the pilot system. The comparison of these evaluations shows that the results obtained with usability tests differ significantly from those gained from the real usage, and the data of the initial use differs significantly from the data collected after that. For example, the differences in help requests, interruptions, speech recognition rejections, silence timeouts, and repeat requests are highly significant, and in some cases, such as explicit quit requests, enormous (65% versus 3%). Index Terms: spoken dialog systems, evaluation, pilot studies 1. Introduction We have developed multiple bus timetable systems in various research projects on top of the common Jaspis system architecture [1]. The lessons learned from previous spoken dialogue timetable systems suggest that open, user-initiative dialogue strategy based on the data collected from human- human interaction fails to provide robust enough interface [2]. Similar experiences have been gained with other studies [3]. In the case of the Stopman system we developed a task-oriented interface that provides the basic functionality in a system initiative manner, while the rest of the functionality is available with a user-initiative interface. We have conducted many usability studies to the system, and collected data in a 30-month public use with real users. In this paper we present the experiences gained from the different evaluations. In particular, we focus on the differences of data collected in usability studies, and data from different periods of real use. 2. The Stopman System The Stopman system provides timetables for each of the about 1200 bus stops in Tampere City area. The aim of the system is to satisfy most of the callers with the first timetable listing, as demonstrated in Example 1. At the beginning of the call, the system requests the user to give a bus stop name (S 1 ). The most fundamental information is included in the initial timetable listing, which explains the length of the prompt shown in the example (S 3 ). After this, the rest of the functions are available. Functionality includes navigation in the timetable, selection of specific bus lines, and specifying a certain time (U 3 ). S 1 : Welcome to Stopman. You can ask help by saying "tell instructions". Please tell the name of the bus stop, for example “Central Square”. U 1 : Alexander Church”. S 2 : Do you want to retrieve timetable for stop “Alexander Church”? U 2 : Yes. S 3 : The time is 10:10. Today, the next buses leaving from bus stop “Alexander Church” are to “Vehmainen” number 5, now (list continues)… Please give your command, for example “tell instructions”. U 3 : Select line.” S 4 : Please tell the line number. You can list the lines by saying “list lines”. U 4 : Thirteen.” S 5 : Next number thirteen buses from “Alexander Church” are to Hermia in 1 minute, to “Ikuri”, in (list continues)… Example 1: An example dialogue with the Stopman system. 2.1. System functionality categories The interaction with the Stopman system consists of 10 types of user inputs (Table 1). The mandatory input is the name or the number of a bus stop. It is not possible to have meaningful interaction without this, and all other input is regarded as optional. The second category of user inputs consists of the two ways to end the call (hang-up, explicit request). The rest of the categories include help and repeat requests, confirmations, advanced functionality (i.e., the functionality other than mandatory), user interruptions, and different error situations. All functionality is available with speech and DTMF inputs, and the system gives help on how to use these modalities. Description Example 1 Mandatory functionality Main library 2 End of call Thanks, goodbye! 3 Help requests Tell instructions 4 Repeat requests Repeat the last one 5 Confirmations “Yes” 6 Advanced functionality Select another day 7 ASR rejections <NOT RECOGNIZED> 8 Missing inputs <SILENCE > 9 Invalid inputs <INVALID DTMF> 10 User interruptions <USER INTERRUPT> Table 1: Stopman functionality categories.