Human Evaluation of the LOGOS’ Spoken Dialogue System Alexandros Lazaridis, Theodoros Kostoulas, Iosif Mporas, Todor Ganchev, Nikos Katsaounos, Stavros Ntalampiras, Nikos Fakotakis Artificial Intelligence Group, Wire Communications Laboratory Dept. of Electrical and Computer Engineering, University of Patras, Greece {alaza, tkost, imporas, tganchev, nkatsaounos, dallas, fakotaki}@wcl.ee.upatras.gr ABSTRACT In this work, the evaluation of the LOGOS’ spoken dialogue system is presented. The system offers user-friendly access to information, entertainment devices and white good appliances. Short description of the LOGOS system’s architecture is given. The user interface of the system is based on remote control device, PC keyboard and spoken language. In this paper we focus on the speech interface and the spoken dialogue management. A group of 15 members, aged 23 to 35, were used, in order to evaluate the usability and the degree of acceptance of the LOGOS’ spoken dialogue system from real users. The users were given taskcards, which constructed the scenarios implemented for the home devices’ control and the SMS messaging services. Categories and Subject Descriptors I.2.7 [Natural Language Processing]: Speech recognition and synthesis General Terms Design, Experimentation, Performance, Measurement Keywords Smart Home, Dialogue, Spoken Dialogue System, Multimodal Dialogue System, Speech Interaction, Human Evaluation. 1. INTRODUCTION 1 The progress of technology provides the means for user-friendly access to various kinds of information. Rapid growth of the usage of both spoken and multimodal dialogue systems has been observed over the last decades. To this end, many unimodal and multimodal dialogue systems have been reported in the literature. In [1] an adaptive mixed initiative spoken dialogue system (MIMIC), which provides movie show-time information, is described. A spoken dialogue system’s architecture for rapid prototyping is described in [2]. The features that support rapid prototyping include a clear separation of generic dialogue processing algorithms from domain and language-specific Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PETRA'08, July 15-19, 2008, Athens, Greece. Copyright 2008 ACM 978-1-60558-067-8-15/07/08.$5.00. knowledge sources. In [3], a prototype, which has been based on plan-based dialogue management, where the system interacts with the user to gather facts, which consequently trigger rules and generate more facts as the interaction progresses [4], is described. A crucial issue that arises, when designing and improving a dialogue system is measuring the acceptability of this system through evaluating its performance. In [5] a spoken language understanding and dialogue system in the domain of appointment schedule was presented, performing an elaborate evaluation of the system based on measurements of accuracy. Their approach consisted of measurements on word accuracy, constituent accuracy, and concept accuracy. In [6] findings from the long- term study of a speech-based bus timetable system are presented. The evaluation showed that the results obtained with usability tests differ significantly from those gained from the real usage. In [7] efforts in data collection and performance evaluation in support of spoken dialogue system development are described. Two understanding metrics, called query density and concept efficiency, were described. In [8], features to enhance the usability of a spoken dialogue system in an automotive environment are discussed. The tests, which were performed in order to evaluate those features and the methods used to assess the test results, were detailed. In [9], Kamm et al. focus on the design and evaluation of spoken dialogue systems. Firstly, a discussion was made on general user interfaces principles and, secondly, the description of the PARADISE framework for evaluating spoken dialog systems was presented. In [10], a series of issues had been addressed, when evaluating the usability of spoken language dialogue systems, including types and purpose of evaluation, when to evaluate and which methods to use. The present paper describes the experimentations performed towards human evaluation of the LOGOS smart-home spoken dialogue system. The LOGOS system offers user-friendly access to information and smart-home appliances. In the present work, we focus on the speech interface, using only this as the input modality of our system. This paper is organized as follows: Section 2 describes the system’s architecture. In Section 3 the experimental setup is detailed. In Section 4 the human evaluation results are reported. 2. LOGOS SYSTEM The Logos system’s architecture is illustrated in Figure 1. The multimodal user interface offers user-friendly access to various appliances installed in a smart home environment. In the present work, we focus on the speech interface and the evaluation of the system’s acceptability, when using speech as an input. Permission to make digital or hard copies of part or all of this work or personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. PETRA'08, July 15-19, 2008, Athens, Greece. Copyright 2008 ACM 978-1-60558-067-8... $5.00