Towards Conversationally Intelligent Dialog Systems Jennifer Smith SRI International, Menlo Park, California, United States jennifer.smith@sri.com Aaron Spaulding AI Center, SRI International, Menlo Park, California, United States aaron.spaulding@sri.com Harry Bratt SRI International, Menlo Park, California, United States harry.bratt@sri.com Dimitra Vergyri SRI International, Menlo Park, California, United States dimitra.vergyri@sri.com Girish Acharya AI Center, SRI International, Menlo Park, California, United States girish.acharya@sri.com Kristin Precoda SRI International, Menlo Park, California, United States kristin.precoda@sri.com Andreas Kathol SRI International, Menlo Park, California, United States andreas.kathol@sri.com Colleen Richey SRI International, Menlo Park, California, United States colleen.richey@sri.com ABSTRACT Spoken dialog systems, lacking the means to address the complex phenomena of spontaneous speech and conversational dynamics, force users into a constrained mode of dialog that resembles text- based interaction more closely than spoken conversation. Turn- taking is simplifed and discourse-related information is lost, as discourse markers are largely ignored and prosodic information is not captured or utilized. We hypothesize that incorporating a few of these key conversational phenomena at specifc points in a dialog will reduce cognitive load in spoken human-computer interaction and expand the potential application areas of dialog systems to tasks requiring more complex interactions. In this paper, we describe our approach to adding conversational intelligence to dialog systems and our work to date validating the hypothesis that adding conversational intelligence to existing dialog systems will signifcantly reduce users’ cognitive load. CCS CONCEPTS · Human-centered computing → Human computer interaction (HCI); Interaction paradigms; Natural language interfaces; Human computer interaction (HCI); HCI design and evaluation methods; User studies; · Computing methodologies → Artifcial intelli- gence; Natural language processing. KEYWORDS Conversational intelligence, Dialog systems, Spontaneous speech, conversational AI, dialogue complexity, human computer interac- tion Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. CHI ’22 Extended Abstracts, April 29–May 05, 2022, New Orleans, LA, USA © 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-9156-6/22/04. . . $15.00 https://doi.org/10.1145/3491101.3519842 ACM Reference Format: Jennifer Smith, Aaron Spaulding, Harry Bratt, Dimitra Vergyri, Girish Acharya, Kristin Precoda, Andreas Kathol, and Colleen Richey. 2022. To- wards Conversationally Intelligent Dialog Systems. In CHI Conference on Human Factors in Computing Systems Extended Abstracts (CHI ’22 Extended Abstracts), April 29–May 05, 2022, New Orleans, LA, USA. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3491101.3519842 1 INTRODUCTION Dialog system designers simplifed the problem of converting hu- man speech into commands that a machine can understand by forc- ing users to interact according to conversational rules that mimic text-based interaction [1ś7]. These conversational rules limit a user’s turn to a complete request or command, rather than allowing the user to converse according to the naturally complex spoken turn-taking style to which they are accustomed. When humans communicate in spoken conversation, they do not simply engage in a sequence of turns consisting of complete and grammatically correct utterances [8ś16]. Instead, they participate in a process of updating the common ground in which each party actively monitors themselves and the other to ensure that the information necessary for the goal of their conversation has been correctly understood. Forcing users to adapt to a diferent way of conversing increases the cognitive load and frustration of interacting with dialog systems [5, 17]. Dialog system design is moving towards a more naturalistic communication style [6, 18ś21]. Systems like Google Duplex at- tempt to model human-like conversational behavior by training systems on huge amounts of in-domain data [22]. While these sys- tems yield impressive results, their application is limited by their dependence upon access to data. Systems that fully model human speech also introduce practical and ethical issues when they are sophisticated enoughÐand sound real enoughÐthat they deceive humans [23]. We can avoid these issues and still reduce the cogni- tive load required to interact with a dialog system by implementing smaller changes to existing technology. These improvements will increase fexibility and usability, supporting human like conversa- tional patterns without introducing the additional issues that may arise when a user believes they are talking to a real person rather