Impact of Domain and User’s Learning Phase on Task and Session Identification in Smart Speaker Intelligent Assistants Seyyed Hadi Hashemi University of Amsterdam Amsterdam, The Netherlands hashemi@uva.nl Kyle Williams Microsoft Redmond, USA Kyle.Williams@microsoft.com Ahmed El Kholy Microsoft Redmond, USA Ahmed.ElKholy@microsoft.com Imed Zitouni Microsoft Redmond, USA izitouni@microsoft.com Paul A. Crook 2 Facebook Seattle, USA pacrook@fb.com ABSTRACT Task and session identifcation is a key element of system evaluation and user behavior modeling in Intelligent Assistant (IA) systems. However, identifying task and sessions for IAs is challenging due to the multi-task nature of IAs and the diferences in the ways they are used on diferent platforms, such as smart-phones, cars, and smart speakers. Furthermore, usage behavior may difer among users depending on their expertise with the system and the tasks they are interested in performing. In this study, we investigate how to identify tasks and sessions in IAs given these diferences. To do this, we analyze data based on the interaction logs of two IAs integrated with smart-speakers. We ft Gaussian Mixture Models to estimate task and session boundaries and show how a model with 3 components models user interactivity time better than a model with 2 components. We then show how session boundaries difer for users depending on whether they are in a learning-phase or not. Finally, we study how user inter-activity times difers depending on the task that the user is trying to perform. Our fndings show that there is no single task or session boundary that can be used for IA evaluation. Instead, these boundaries are infuenced by the experience of the user and the task they are trying to perform. Our fndings have implications for the study and evaluation of Intelligent Agent Systems. ACM Reference Format: Seyyed Hadi Hashemi, Kyle Williams, Ahmed El Kholy, Imed Zitouni, and Paul A. Crook. 2018. Impact of Domain and User’s Learning Phase on Task and, Session Identifcation in Smart Speaker Intelligent Assistants. In The 27th ACM International Conference on Information and Knowledge Management (CIKM ’18), October 22ś26, 2018, Torino, Italy. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3269206.3271803 Keywords: intelligent assistants; behavioral dynamics; user ses- sions; mixture models Work done while interning at Microsoft. 2 Work done while at Microsoft. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. CIKM ’18, October 22ś26, 2018, Torino, Italy © 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-6014-2/18/10. . . $15.00 https://doi.org/10.1145/3269206.3271803 1 INTRODUCTION There is a growing interest in integrating Intelligent Assistant (IA) Systems in diferent devices with an aim of providing enriched ex- periences for users [3]. For instance, IAs such as Apple Siri, Google Now, Microsoft Cortana and Amazon Alexa have been integrated with Desktop computers, smart phones, and smart speakers. How- ever, user behavior varies in diferent contexts [10, 11, 17, 34], like platform, input method, etc. For example, users can click on IA responses and change their view-port in interacting with an IA on smart-phones or desktops [19, 38], which is not available in smart speakers. Therefore, due to behavioral dynamics in interacting with IAs, their evaluation on diferent platforms is challenging, suggest- ing that diferent means of evaluation for diferent platforms may be necessary. Understanding user behavior and evaluating user satisfaction in interacting with IAs on mobile phones and Desktop comput- ers has previously been studied [15, 18ś20, 25, 38, 39]; however, to our knowledge, there have been no studies investigating user satisfaction and IA efectiveness for smart speakers, which are be- coming increasingly popular. For instance, one study found that there was a 128.9% increase in the number of smart speaker users in the United States in 2017 compared to 2016 1 . In this paper, we use the phrase smart speaker to refer to a wireless speaker device that integrates an intelligent assistant. For the purpose of this study, we focus on devices that have no screen and where the only method of communicating with the device is via voice. Smart speakers can be used for many tasks, such as arranging meetings and controlling home devices via home-automation. This multi-task nature of smart speakers creates a multi-task experience for users, where a task refers to a single goal or information need that the user wishes to satisfy [14]. Furthermore, a series of tasks can be composed to form a session, which refers to a short period of contiguous time spent to fulfll one or multiple tasks [16]. Eval- uating the satisfaction of users for tasks and sessions is a critical component of IA evaluation; however, it is not obvious how one should defne task and session boundaries for IAs. Identifying sessions based on user inactivity thresholds as a ses- sion timeout is the most common session identifcation approach in Information Retrieval (IR) [5, 8, 25, 33]. The basic idea is to defne an inactivity window that can be used to separate sessions. The idea was frst proposed by Catledge and Pitkow [4], in which they use client-side tracking to examine browsing behavior. They re- ported the mean time between logged events is 9.3 minutes and, by choosing to add 1.5 standard deviation to the mean, they proposed 1 https://www.emarketer.com/Article/Alexa-Say-What-Voice-Enabled-Speaker- Usage-Grow-Nearly-130-This-Year/1015812