Evaluating the User Interface of a Ubiquitous Computing system Doorman Kaj Mäkelä, Esa-Pekka Salonen, Markku Turunen, Jaakko Hakulinen, and Roope Raisamo Tampere Unit for Computer-Human Interaction (TAUCHI) FIN-33014 University of Tampere, Finland +358 3 215 8558 kaj@cs.uta.fi 1. Introduction We conducted a Wizard of Oz experiment for a speech-based ubiquitous computing system called Doorman (‘Ovimies’ in Finnish) [4] as a part of our iterative development process. We evaluated our current multimodal and spoken-language interaction models on users before constructing the actual speech recogniser. Ubiquitous computing systems cannot necessarily be tested in laboratories. Therefore, in many cases the testing of the interfaces has to be done in the actual scene of the action with real life problems. To gather reliable information about human-computer communication it is important to observe the human behaviour in a situation in which they believe to be interacting with a real computer system. [3]. Wizard of Oz testing [1, 3] is an experimental user interface evaluation method in which the user of the system is made to believe that he or she is interacting with a fully implemented system though the whole or a part of the interaction of the system is controlled by a human, a wizard, or several of them. Wizard of Oz tests are useful in supporting design process and evaluating the interface [1, 2, 8]. The method has been commonly used to test natural language dialogue systems [3] and multimodal systems [5, 8]. Here we apply this method to ubiquitous computing applications. 2. System description The Doorman system is used to help the staff members and visitors in TAUCHI [6] premises in their communicational tasks and everyday lives. This is done by automatically opening the door after recognising the person or the target of the visit, by guiding the visitors to their destination and by conveying organizational and personal messages, such as e-mail messages, to the staff members. The aim of the system is to serve all the users in some way, at least by calling for external help when a problem occurs. The Doorman system has some resemblance to Office Monitor by Yankelovich and McLain [9]. The Doorman uses spoken language as the main modality to communicate with the users. Speech recognition is used as an input and speech synthesis as an output method. The target of the visitor’s visit or the identity of the staff member is recognised from their speech. The system is installed in two locations, one in the front door and one in the lobby. It gathers information about the situation at the front door with a microphone, a doorbell switch and a door micro- switch. The output of the system is presented to the user with synthesised speech via a speaker. Inside, in the lobby the guidance is given in a multimodal way by using speech synthesis together with an anthropomorphic robot pointing to the direction the user should go to find the target. The robot itself has been implemented with three servo-motors controlled by a micro-controller. The Doorman system is based on a distributed software architecture called Jaspis [7]. Jaspis is a Java-based adaptive speech user interface architecture that has been developed in TAUCHI originally for spoken dialogue applications, but has been expanded to include features that support developing ubiquitous computing applications. 3. Description of the experiment The aim of the study was to test and analyse the multimodal dialogue and spoken-language model designed for the system before constructing the actual speech recognisers. We wanted to find out how the users actually behave and what kind of language they actually use when talking to this kind of computing system. We were also interested to know how our current guidance to the target of the visit was perceived and understood. The experiment was conducted by replacing the forthcoming speech recognition module of the Doorman system with a wizard application used manually by the human wizard observing the situation. The speech of the users was recorded and all the system tasks and sensor inputs were logged to be analysed later. We implemented the application for the wizard to give speech recognition information manually to the system. The control application was designed to be as simple to use as possible to ensure short response times and to minimise the possibility for errors. The tool provides a simple list-based user interface consisting of all the possible alternatives for the speech recognition results. To keep the behaviour of the system consistent and credible we formed a set of rules for the human wizards operating the system. The test was conducted in five days, one of which was used for training and pilot testing the setup. The test was run approximately 4 hours per day, on a quite varying basis. The test sessions lasted from 45 minutes to 1.5 hours each time. The test was conducted by two persons: one was acting as a wizard and one was gathering permissions from visitors for recording. 4. Results During the experiment, the system was used in 74 occasions, of which 22 were visitors, and 52 were staff members. The results show that the system prompt was formed so that in most of the visitor cases (77 percents, 17 persons) the users acted and