Impact of Domain and User’s Learning Phase on Task and
Session Identification in Smart Speaker Intelligent Assistants
Seyyed Hadi Hashemi
∗
University of Amsterdam
Amsterdam, The Netherlands
hashemi@uva.nl
Kyle Williams
Microsoft
Redmond, USA
Kyle.Williams@microsoft.com
Ahmed El Kholy
Microsoft
Redmond, USA
Ahmed.ElKholy@microsoft.com
Imed Zitouni
Microsoft
Redmond, USA
izitouni@microsoft.com
Paul A. Crook
2
Facebook
Seattle, USA
pacrook@fb.com
ABSTRACT
Task and session identifcation is a key element of system evaluation
and user behavior modeling in Intelligent Assistant (IA) systems.
However, identifying task and sessions for IAs is challenging due
to the multi-task nature of IAs and the diferences in the ways they
are used on diferent platforms, such as smart-phones, cars, and
smart speakers. Furthermore, usage behavior may difer among
users depending on their expertise with the system and the tasks
they are interested in performing. In this study, we investigate how
to identify tasks and sessions in IAs given these diferences. To
do this, we analyze data based on the interaction logs of two IAs
integrated with smart-speakers. We ft Gaussian Mixture Models to
estimate task and session boundaries and show how a model with
3 components models user interactivity time better than a model
with 2 components. We then show how session boundaries difer
for users depending on whether they are in a learning-phase or not.
Finally, we study how user inter-activity times difers depending
on the task that the user is trying to perform. Our fndings show
that there is no single task or session boundary that can be used
for IA evaluation. Instead, these boundaries are infuenced by the
experience of the user and the task they are trying to perform.
Our fndings have implications for the study and evaluation of
Intelligent Agent Systems.
ACM Reference Format:
Seyyed Hadi Hashemi, Kyle Williams, Ahmed El Kholy, Imed Zitouni,
and Paul A. Crook. 2018. Impact of Domain and User’s Learning Phase
on Task and, Session Identifcation in Smart Speaker Intelligent Assistants.
In The 27th ACM International Conference on Information and Knowledge
Management (CIKM ’18), October 22ś26, 2018, Torino, Italy. ACM, New York,
NY, USA, 10 pages. https://doi.org/10.1145/3269206.3271803
Keywords: intelligent assistants; behavioral dynamics; user ses-
sions; mixture models
∗
Work done while interning at Microsoft.
2
Work done while at Microsoft.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specifc permission
and/or a fee. Request permissions from permissions@acm.org.
CIKM ’18, October 22ś26, 2018, Torino, Italy
© 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-6014-2/18/10. . . $15.00
https://doi.org/10.1145/3269206.3271803
1 INTRODUCTION
There is a growing interest in integrating Intelligent Assistant (IA)
Systems in diferent devices with an aim of providing enriched ex-
periences for users [3]. For instance, IAs such as Apple Siri, Google
Now, Microsoft Cortana and Amazon Alexa have been integrated
with Desktop computers, smart phones, and smart speakers. How-
ever, user behavior varies in diferent contexts [10, 11, 17, 34], like
platform, input method, etc. For example, users can click on IA
responses and change their view-port in interacting with an IA on
smart-phones or desktops [19, 38], which is not available in smart
speakers. Therefore, due to behavioral dynamics in interacting with
IAs, their evaluation on diferent platforms is challenging, suggest-
ing that diferent means of evaluation for diferent platforms may
be necessary.
Understanding user behavior and evaluating user satisfaction
in interacting with IAs on mobile phones and Desktop comput-
ers has previously been studied [15, 18ś20, 25, 38, 39]; however,
to our knowledge, there have been no studies investigating user
satisfaction and IA efectiveness for smart speakers, which are be-
coming increasingly popular. For instance, one study found that
there was a 128.9% increase in the number of smart speaker users in
the United States in 2017 compared to 2016
1
. In this paper, we use
the phrase smart speaker to refer to a wireless speaker device that
integrates an intelligent assistant. For the purpose of this study, we
focus on devices that have no screen and where the only method
of communicating with the device is via voice.
Smart speakers can be used for many tasks, such as arranging
meetings and controlling home devices via home-automation. This
multi-task nature of smart speakers creates a multi-task experience
for users, where a task refers to a single goal or information need
that the user wishes to satisfy [14]. Furthermore, a series of tasks
can be composed to form a session, which refers to a short period
of contiguous time spent to fulfll one or multiple tasks [16]. Eval-
uating the satisfaction of users for tasks and sessions is a critical
component of IA evaluation; however, it is not obvious how one
should defne task and session boundaries for IAs.
Identifying sessions based on user inactivity thresholds as a ses-
sion timeout is the most common session identifcation approach in
Information Retrieval (IR) [5, 8, 25, 33]. The basic idea is to defne
an inactivity window that can be used to separate sessions. The
idea was frst proposed by Catledge and Pitkow [4], in which they
use client-side tracking to examine browsing behavior. They re-
ported the mean time between logged events is 9.3 minutes and, by
choosing to add 1.5 standard deviation to the mean, they proposed
1
https://www.emarketer.com/Article/Alexa-Say-What-Voice-Enabled-Speaker-
Usage-Grow-Nearly-130-This-Year/1015812