A High-Performance Dual-Wizard Infrastructure for Designing Speech, Pen, and Multimodal Interfaces Phil Cohen 1 Colin Swindells 2 Sharon Oviatt 2 Alex Arthur 1 1 Adapx Inc Seattle, WA, 98104 http://www.adapx.com phil.cohen@adapx.com 2 Incaa Designs Seattle, WA 98110 http://www.incaadesigns.org sharon.oviatt@incaadesigns.org ABSTRACT The present paper reports on the design and performance of a novel dual-Wizard simulation infrastructure that has been used effectively to prototype next-generation adaptive and implicit multimodal interfaces for collaborative groupwork. This high-fidelity simulation infrastructure builds on past development of single-wizard simulation tools for multiparty multimodal interactions involving speech, pen, and visual input [1]. In the new infrastructure, a dual-wizard simulation environment was developed that supports (1) real-time tracking, analysis, and system adaptivity to a user's speech and pen paralinguistic signal features (e.g., speech amplitude, pen pressure), as well as the semantic content of their input. This simulation also supports (2) transparent user training to adapt their speech and pen signal features in a manner that enhances the reliability of system functioning, i.e., the design of mutually-adaptive interfaces. To accomplish these objectives, this new environment also is capable of handling (3) dynamic streaming digital pen input. We illustrate the performance of the simulation infrastructure during longitudinal empirical research in which a user-adaptive interface was designed for implicit system engagement based exclusively on users' speech amplitude and pen pressure [2]. While using this dual-wizard simulation method, the wizards responded successfully to over 3,000 user inputs with 95-98% accuracy and a joint wizard response time of less than 1.0 second during speech interactions and 1.65 seconds during pen interactions. Furthermore, the interactions they handled involved naturalistic multiparty meeting data in which high school students were engaged in peer tutoring, and all participants believed they were interacting with a fully functional system. This type of simulation capability enables a new level of flexibility and sophistication in multimodal interface design, including the development of implicit multimodal interfaces that place minimal cognitive load on users during mobile, educational, and other applications. Keywords Wizard-of-Oz, high-fidelity simulation, implicit system engagement, speech amplitude, pen pressure, dual-wizard protocol, collaborative meetings, multi-stream multimodal data, streaming digital pen and paper. ACM Classification Keywords H5.2 Information interfaces and presentation: User interfaces, user-centered design, theory and methods, interaction styles, input devices and strategies, evaluation/methodology, voice I/O, natural language. ACM General Terms Experimentation, Human Factors, Measurement, Performance. 1. INTRODUCTION The present paper reports on the design and performance of a novel dual-Wizard simulation infrastructure that has been used effectively to prototype next-generation adaptive and implicit multimodal interfaces for collaborative groupwork. This high-fidelity simulation infrastructure builds from past development of single-wizard simulation tools for multiparty multimodal interactions involving speech, pen, and visual input [1]. In the new infrastructure, a dual-wizard simulation environment was developed for which the system’s response is a function of the responses of two hidden assistants, each with different but related tasks. Such a methodology is needed for limiting the amount of data to which one wizard must attend in order to perform correctly and quickly. We illustrate the performance of the simulation infrastructure during longitudinal empirical research in which a user-adaptive interface was designed for implicit system engagement based exclusively on users' paralinguistic signal features, namely speech amplitude and pen pressure [2]. This environment supports: (1) real-time tracking, analysis, and system adaptivity to a user's speech amplitude and pen pressure, as well as the semantic content of their input, and. (2) transparent user training to adapt their speech and pen signal features in a manner that enhances the reliability of implicit system engagement, i.e., the design of mutually-adaptive interfaces. To accomplish these objectives, this new environment also is capable of handling (3) dynamic streaming digital pen input, which along with speech and video-based capture of meeting participants, imposes significant demands on wizard performance during a study. The remainder of this paper discusses the simulation environment, including tools and techniques that were developed to support wizards’ rapid and accurate responses. Although a few multi- wizard systems have been developed (e.g., [9, 10]), the early work [10] developed overly complex wizard-wizard interaction schemes that precluded rapid performance or actual use in research studies. In contrast, this effort focuses on developing new simulation methods to design interfaces that efficiently support collaborative multi-person meetings. This paper reports dual- wizard response times comparable in performance to single-