Towards Conversationally Intelligent Dialog Systems
Jennifer Smith
SRI International, Menlo Park,
California, United States
jennifer.smith@sri.com
Aaron Spaulding
AI Center, SRI International, Menlo
Park, California, United States
aaron.spaulding@sri.com
Harry Bratt
SRI International, Menlo Park,
California, United States
harry.bratt@sri.com
Dimitra Vergyri
SRI International, Menlo Park,
California, United States
dimitra.vergyri@sri.com
Girish Acharya
AI Center, SRI International, Menlo
Park, California, United States
girish.acharya@sri.com
Kristin Precoda
SRI International, Menlo Park,
California, United States
kristin.precoda@sri.com
Andreas Kathol
SRI International, Menlo Park,
California, United States
andreas.kathol@sri.com
Colleen Richey
SRI International, Menlo Park,
California, United States
colleen.richey@sri.com
ABSTRACT
Spoken dialog systems, lacking the means to address the complex
phenomena of spontaneous speech and conversational dynamics,
force users into a constrained mode of dialog that resembles text-
based interaction more closely than spoken conversation. Turn-
taking is simplifed and discourse-related information is lost, as
discourse markers are largely ignored and prosodic information
is not captured or utilized. We hypothesize that incorporating a
few of these key conversational phenomena at specifc points in
a dialog will reduce cognitive load in spoken human-computer
interaction and expand the potential application areas of dialog
systems to tasks requiring more complex interactions. In this paper,
we describe our approach to adding conversational intelligence to
dialog systems and our work to date validating the hypothesis that
adding conversational intelligence to existing dialog systems will
signifcantly reduce users’ cognitive load.
CCS CONCEPTS
· Human-centered computing → Human computer interaction
(HCI); Interaction paradigms; Natural language interfaces; Human
computer interaction (HCI); HCI design and evaluation methods;
User studies; · Computing methodologies → Artifcial intelli-
gence; Natural language processing.
KEYWORDS
Conversational intelligence, Dialog systems, Spontaneous speech,
conversational AI, dialogue complexity, human computer interac-
tion
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specifc permission
and/or a fee. Request permissions from permissions@acm.org.
CHI ’22 Extended Abstracts, April 29–May 05, 2022, New Orleans, LA, USA
© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9156-6/22/04. . . $15.00
https://doi.org/10.1145/3491101.3519842
ACM Reference Format:
Jennifer Smith, Aaron Spaulding, Harry Bratt, Dimitra Vergyri, Girish
Acharya, Kristin Precoda, Andreas Kathol, and Colleen Richey. 2022. To-
wards Conversationally Intelligent Dialog Systems. In CHI Conference on
Human Factors in Computing Systems Extended Abstracts (CHI ’22 Extended
Abstracts), April 29–May 05, 2022, New Orleans, LA, USA. ACM, New York,
NY, USA, 7 pages. https://doi.org/10.1145/3491101.3519842
1 INTRODUCTION
Dialog system designers simplifed the problem of converting hu-
man speech into commands that a machine can understand by forc-
ing users to interact according to conversational rules that mimic
text-based interaction [1ś7]. These conversational rules limit a
user’s turn to a complete request or command, rather than allowing
the user to converse according to the naturally complex spoken
turn-taking style to which they are accustomed. When humans
communicate in spoken conversation, they do not simply engage
in a sequence of turns consisting of complete and grammatically
correct utterances [8ś16]. Instead, they participate in a process of
updating the common ground in which each party actively monitors
themselves and the other to ensure that the information necessary
for the goal of their conversation has been correctly understood.
Forcing users to adapt to a diferent way of conversing increases the
cognitive load and frustration of interacting with dialog systems
[5, 17].
Dialog system design is moving towards a more naturalistic
communication style [6, 18ś21]. Systems like Google Duplex at-
tempt to model human-like conversational behavior by training
systems on huge amounts of in-domain data [22]. While these sys-
tems yield impressive results, their application is limited by their
dependence upon access to data. Systems that fully model human
speech also introduce practical and ethical issues when they are
sophisticated enoughÐand sound real enoughÐthat they deceive
humans [23]. We can avoid these issues and still reduce the cogni-
tive load required to interact with a dialog system by implementing
smaller changes to existing technology. These improvements will
increase fexibility and usability, supporting human like conversa-
tional patterns without introducing the additional issues that may
arise when a user believes they are talking to a real person rather