From Writing Dialogue to Designing Conversation: Considering
the potential of Conversation Analysis for Voice User Interfaces
Adam Brandt
Newcastle University
adam.brandt@newcastle.ac.uk
Spencer Hazel
Newcastle University
spencer.hazel@newcastle.ac.uk
Rory Mckinnon
Ufonia Limited
rm@ufonia.com
Kleopatra Sideridou
Newcastle University
k.sideridou2@newcastle.ac.uk
Joe Tindale
Ufonia Limited
jt@ufonia.com
Nikoletta Ventoura
Ufonia Limited
nv@ufonia.com
ABSTRACT
Conversation design at least partly aspires to create Voice User Inter-
faces which emulate human speech production. And yet, there is no
established approach for the development of naturalistic conversa-
tional infrastructure for VUIs; conversation designers are advised to
work from their common sense understanding of conversation, pro-
ducing written scripts, based on memory and imagination, which
are later converted into speech. This is a shortcoming in conversa-
tion design which needs to be addressed. In this provocation paper,
we argue that the starting point in the development of any VUI
should be the examination of natural spoken conversation, prefer-
ably from the same interactional context in which the VUI will be
deployed. We provide a short example to illustrate how the current
process of conversation scriptwriting can be a barrier to this, and
demonstrate how this can be overcome using the social scientifc
approach of Conversation Analysis (CA).
CCS CONCEPTS
· Computing methodologies → Artifcial intelligence; Natu-
ral language processing; Discourse, dialogue and pragmatics; ·
Human-centered computing → Interaction design; Interaction
design, theory, concepts and paradigms.
KEYWORDS
Conversation design, Voice user interfaces, Social interaction, Con-
versation Analysis
ACM Reference Format:
Adam Brandt, Spencer Hazel, Rory Mckinnon, Kleopatra Sideridou, Joe
Tindale, and Nikoletta Ventoura. 2023. From Writing Dialogue to Designing
Conversation: Considering the potential of Conversation Analysis for Voice
User Interfaces. In ACM conference on Conversational User Interfaces (CUI
’23), July 19–21, 2023, Eindhoven, Netherlands. ACM, New York, NY, USA,
6 pages. https://doi.org/10.1145/3571884.3603758
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
CUI ’23, July 19–21, 2023, Eindhoven, Netherlands
© 2023 Copyright held by the owner/author(s).
ACM ISBN 979-8-4007-0014-9/23/07.
https://doi.org/10.1145/3571884.3603758
1 INTRODUCTION
Guidance and principles to underpin the design and development of
Voice User Interfaces (VUIs) remain in infancy [6, 19ś21]. Among
what support is available, aspirations of ‘naturalness’ are central,
often with a stated aim to emulate language as it is produced by
humans, for humans, in everyday conversation. Conversation de-
signers themselves report the use of language as it is spoken (as
opposed to as it is written) and the use of appropriate prosody (such
as intonations, pauses and stress) as among the most important
characteristics of ‘naturalness’ [14]. Similarly, prevalent conversa-
tion design guides promise to provide support in łcreating a natural
sounding conversationž [1] or łcraft[ing] conversations that are
natural and intuitive for usersž [2].
However, conversation designers report ‘making interaction
which is natural’ among their major challenges [23]. Reasons cited
for this include the limitations of synthesized voice technology
for aspects of speech (like prosody and non-lexical vocalisations),
difculties in writing scripts for spoken language during the con-
versation design process, and the lack of sufcient guidance or
resources to help with this [14].
In essence, this issue is due to a current shortcoming of VUI
design: the disconnection between the dialogue generator and the
speech synthesiser, which needs to be improved. The root for this
problem is arguably the fact that traditionally, and currently, Lan-
guage Models are trained primarily on text data. So conversation
designers are tasked with the challenge of producing written text
which then sound realistic, or at least plausible, when converted
into spoken output by the speech synthesiser.
Despite the reported importance of, and difculty in, emulat-
ing natural spoken language, there does not appear to be a fxed
approach for the development of naturalistic conversational in-
frastructure for VUIs, with conversation designers recommended
to write scripts based on their common sense understanding of
conversation. Among over 100 VUI designers surveyed [23], none
reported examining natural conversation at the design stage. In-
stead, common practices included checking similar existing VUIs,
accessing online resources, and discussing with colleagues.
Since as early as the 1990s, there have been calls for speech inter-
face designers to look to natural human conversation for inspiration
[3]. Such calls have increased in the last few years, with a number of
very recent collaborative explorations involving researchers of spo-
ken conversation and conversation design practitioners [22, 24, 28].
This provocation paper comes from another such collaboration ś