From Writing Dialogue to Designing Conversation: Considering the potential of Conversation Analysis for Voice User Interfaces Adam Brandt Newcastle University adam.brandt@newcastle.ac.uk Spencer Hazel Newcastle University spencer.hazel@newcastle.ac.uk Rory Mckinnon Ufonia Limited rm@ufonia.com Kleopatra Sideridou Newcastle University k.sideridou2@newcastle.ac.uk Joe Tindale Ufonia Limited jt@ufonia.com Nikoletta Ventoura Ufonia Limited nv@ufonia.com ABSTRACT Conversation design at least partly aspires to create Voice User Inter- faces which emulate human speech production. And yet, there is no established approach for the development of naturalistic conversa- tional infrastructure for VUIs; conversation designers are advised to work from their common sense understanding of conversation, pro- ducing written scripts, based on memory and imagination, which are later converted into speech. This is a shortcoming in conversa- tion design which needs to be addressed. In this provocation paper, we argue that the starting point in the development of any VUI should be the examination of natural spoken conversation, prefer- ably from the same interactional context in which the VUI will be deployed. We provide a short example to illustrate how the current process of conversation scriptwriting can be a barrier to this, and demonstrate how this can be overcome using the social scientifc approach of Conversation Analysis (CA). CCS CONCEPTS · Computing methodologies Artifcial intelligence; Natu- ral language processing; Discourse, dialogue and pragmatics; · Human-centered computing Interaction design; Interaction design, theory, concepts and paradigms. KEYWORDS Conversation design, Voice user interfaces, Social interaction, Con- versation Analysis ACM Reference Format: Adam Brandt, Spencer Hazel, Rory Mckinnon, Kleopatra Sideridou, Joe Tindale, and Nikoletta Ventoura. 2023. From Writing Dialogue to Designing Conversation: Considering the potential of Conversation Analysis for Voice User Interfaces. In ACM conference on Conversational User Interfaces (CUI ’23), July 19–21, 2023, Eindhoven, Netherlands. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3571884.3603758 Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). CUI ’23, July 19–21, 2023, Eindhoven, Netherlands © 2023 Copyright held by the owner/author(s). ACM ISBN 979-8-4007-0014-9/23/07. https://doi.org/10.1145/3571884.3603758 1 INTRODUCTION Guidance and principles to underpin the design and development of Voice User Interfaces (VUIs) remain in infancy [6, 19ś21]. Among what support is available, aspirations of ‘naturalness’ are central, often with a stated aim to emulate language as it is produced by humans, for humans, in everyday conversation. Conversation de- signers themselves report the use of language as it is spoken (as opposed to as it is written) and the use of appropriate prosody (such as intonations, pauses and stress) as among the most important characteristics of ‘naturalness’ [14]. Similarly, prevalent conversa- tion design guides promise to provide support in łcreating a natural sounding conversationž [1] or łcraft[ing] conversations that are natural and intuitive for usersž [2]. However, conversation designers report ‘making interaction which is natural’ among their major challenges [23]. Reasons cited for this include the limitations of synthesized voice technology for aspects of speech (like prosody and non-lexical vocalisations), difculties in writing scripts for spoken language during the con- versation design process, and the lack of sufcient guidance or resources to help with this [14]. In essence, this issue is due to a current shortcoming of VUI design: the disconnection between the dialogue generator and the speech synthesiser, which needs to be improved. The root for this problem is arguably the fact that traditionally, and currently, Lan- guage Models are trained primarily on text data. So conversation designers are tasked with the challenge of producing written text which then sound realistic, or at least plausible, when converted into spoken output by the speech synthesiser. Despite the reported importance of, and difculty in, emulat- ing natural spoken language, there does not appear to be a fxed approach for the development of naturalistic conversational in- frastructure for VUIs, with conversation designers recommended to write scripts based on their common sense understanding of conversation. Among over 100 VUI designers surveyed [23], none reported examining natural conversation at the design stage. In- stead, common practices included checking similar existing VUIs, accessing online resources, and discussing with colleagues. Since as early as the 1990s, there have been calls for speech inter- face designers to look to natural human conversation for inspiration [3]. Such calls have increased in the last few years, with a number of very recent collaborative explorations involving researchers of spo- ken conversation and conversation design practitioners [22, 24, 28]. This provocation paper comes from another such collaboration ś