Semi-Automatically Generated High-Level Fusion
for Multimodal User Interfaces
Dominik Ertl, Sevan Kavaldjian, Hermann Kaindl, J¨ urgen Falb
Vienna University of Technology, Institute of Computer Technology
A–1040 Vienna, Austria
{ertl, kavaldjian, kaindl, falb}@ict.tuwien.ac.at
Abstract
Reliable high-level fusion of several input modalities is
hard to achieve, and (semi-)automatically generating it is
even more difficult. However, it is important to address
in order to broaden the scope of providing user interfaces
semi-automatically.
Our approach starts from a high-level discourse
model created by a human interaction designer. It is
modality-independent, so an annotated discourse is semi-
automatically generated, which influences the fusion mech-
anism. Our high-level fusion checks hypotheses from the
various input modalities by use of finite state machines.
These are modality-independent, and they are automatically
generated from the given discourse model. Taking all this
together, our approach provides semi-automatic generation
of high-level fusion. It currently supports input modalities
graphical user interface, (simple) speech, a few hand ges-
tures, and a bar code reader.
1 Introduction
While semi-automatic generation of user interfaces is
still primarily a matter of research, the approaches are be-
coming increasingly mature. Our own approach is based
on high-level discourse modeling, and its focus has been
on graphical user interfaces (GUIs), more precisely WIMP
(window, icon, menu, pointer) interfaces [4, 9, 15]. More
recently, we extended this approach to multimodal inter-
faces.
Managing the input from several modalities requires
high-level fusion. This is a challenging issue on its own,
but semi-automatic generation makes it even harder. Still,
we can generate finite state machines (FSMs) — for check-
ing input hypotheses from the various modalities — from
our discourse models. The results of these checks are com-
municative acts that provide an abstract representation of
what is believed to be the input.
Our overall generation process from high-level dis-
courses to multimodal user interfaces is a further devel-
opment of the one presented in [3], including the new
generation of FSMs (see Figure 1). First, the modality-
independent discourse is modeled by an interaction de-
signer. Then, the FSMs are generated, and a task-level
transformation leads to an annotated discourse. At last, the
annotated discourse is rendered into the final modalities.
As a running example, we use a small part of a multi-
modal user interface of a research robot shopping cart, that
is a non-trivial application of our approach. We use this ex-
ample to show the semi-automatic generation of the fusion
mechanism and its runtime behavior.
The remainder of this paper is organized in the follow-
ing manner. First, we provide some background on high-
level fusion for multimodal user interfaces and our ap-
proach to interaction design, in order to make this paper
self-contained. Then we explain the concept of a modality
provider and sketch how the modalities that we currently
use for input are fed into the fusion mechanism. After
that, we present our annotated discourse model and its semi-
automatic generation. Based on that, the core of the paper
is dedicated to explain our new approach to semi-automatic
generation of fusion based on finite state machines. Finally,
we discuss the fulfillment of the CARE properties [2] by
our approach.
2 State of the Art and Background
In this section, we sketch some background information
about high-level fusion mechanisms for multimodal user in-
terfaces, and our interaction design approach.
2.1 High-level Fusion for Multimodal
User Interfaces
A multimodal user interface offers more than one modal-
ity (e.g., speech and GUI) to the user. In order to process
1
Proceedings of the 43rd Hawaii International Conference on System Sciences - 2010
978-0-7695-3869-3/10 $26.00 © 2010 IEEE