EYE GAZE AND SPEECH FOR DATA ENTRY: A COMPARISON OF DIFFERENT DATA ENTRY METHODS Yeow Kee Tan, Nasser Sherkat, Tony Allen Nottingham Trent University, Department of Computing and Mathematics Burton Street, Nottingham NG1 4BU, United Kingdom [yeow.tan, nasser.sherkat, tony.allen]@ntu.ac.uk ABSTRACT In this paper we present a multimodal interface that employs speech recognition and eye gaze tracking technology for use in data entry tasks. The aim of this work is to compare the usability of this multimodal system against other data entry methods (handwriting, mouse and keyboard and speech only) when carrying out the data entry task of filling a form. Discussions regarding the relationships between efficiency, effectiveness, ergonomic quality, hedonic quality, naturalness, familiarity and users preference are presented. The experimental results show that the majority of the users prefer using the proposed eye and speech system compared to the other form-filling methods even though such a method is neither the fastest nor the most accurate. 1. INTRODUCTION Although research into multimodal interfaces can be dated back to 1980, research into interfaces that combine active (e.g. speech recognition) and passive (e.g. eye gaze) inputs is still in its infancy and needs to be explored further in order to identify new combinations that can improve the human-computer interaction. This type of interface is known as a blended style interface and typical examples include IBM’s Manual And Gaze Input Cascaded pointing system [1], that allows the use of mouse and eye input to select the icons on the desktop, and the combination of facial and speech recognition [2] that is able to reduce word error rate by 27% in a noisy environment. These positive results have led us to research combining eye gaze and speech for data entry [3]. An experiment has been carried out in order to compare the usability of the proposed multimodal interface against other data entry methods. The aspects of efficiency, effectiveness and user preference have been evaluated. In addition to these common usability aspects, aspects such as easy to use, fun to use, familiarity and naturalness have also been observed in order to achieve a more extensive usability study. According to Jordan, “Usability as a concept does not seem to include (positive) feelings such as pride, excitement or surprise” [4]. These positive emotions, mentioned by Jordan, can be classified as Hedonic Quality (HQ). HQ refers to quality attributes with no obvious relation to task/goal-fulfillment, i.e. “original”, “innovative”, “exciting”, or “exclusive”. These attributes address the human needs for novelty/change (i.e. excitement) and social power (i.e. status, pride) [5]. It is to be noted that fun to use differs from easy to use. Carroll and Thomas [6] argued that ease of use implies simplicity, which in turn is partly incompatible with fun. For example by making a system easy to use, there is a chance that it will be boring as well. After years of debate, the Human-Computer Interaction (HCI) research community is now gradually accepting the concept of joy and fun as an important factor in usability [7]. Based on the findings in [8], perceived fun has a stronger effect on user satisfaction than perceived usefulness. It was also observed that user satisfaction will lead to an increased time-spent with a software system. This, in turn, may cause the user to use the system more frequently thereby gaining a better understanding of it (increased familiarity). However by introducing the factor of fun (HQ), the simplicity of a system may also decrease [5]. The question of whether using HQ as one of the main factors in system development is still a debating issue in the HCI community. In this paper, naturalness is defined as the regular way, by which one human being would pass information to another human being (e.g. Person A tell his/her name to person B). Familiarity, on the other hand, is defined as the most common way a particular task (in this case a data entry task of filling a form) is carried out. Note that naturalness and familiarity are different in this case. Naturalness relates more to whether the user experiences a human-to-human input style whilst interacting with the system. Familiarity is concerned more with the methods the user has previously used for carrying out a particular task. 2. SYSTEM IMPLEMENTATION The experiment in this paper compares the use of handwriting (HW), mouse and keyboard (MK), speech only (SO) and eye and speech (ES) for a specific data entry task. The ES and SO systems use the same interface layout (figure 1). This allows users to fill-out a form that consists of the 8 fields (television type, title, surname, initials, house number, street name, city I - 41 0-7803-7965-9/03/$17.00 ©2003 IEEE ICME 2003