2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII)
Representation Learning for Emotion Recognition
from Smartphone Keyboard Interactions
Surjya Ghosh
*‡
, Shivam Goenka
*
, Niloy Ganguly
*
, Bivas Mitra
*
, Pradipta De
†
*
Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, INDIA 721302
†
Department of Computer Science, Georgia Southern University, USA
‡
Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
Email: {surjya.ghosh, shivamgoenka}@iitkgp.ac.in, {niloy, bivas}@cse.iitkgp.ac.in, pde@georgiasouthern.edu
Abstract—Characteristics of typing on smartphone keyboards
among different individuals can elicit emotion, similar to speech
prosody or facial expressions. Existing works on typing based
emotion recognition rely on feature engineering to build machine
learning models, while recent speech and facial expression based
techniques have shown the efficacy of learning the features
automatically. Therefore, in this work, we explore the effec-
tiveness of such learning models in keyboard interaction based
emotion detection. In this paper, we propose an end-to-end
framework, which first uses a sequence-based encoding method
to automatically learn the representation from raw keyboard
interaction pattern and subsequently uses this representation to
train a multi-task learning based neural network (MTL-NN)
to identify different emotions. We carry out a 3-week in-the-
wild study involving 24 participants using a custom keyboard
capable of tracing users’ interaction pattern during text entry. We
collect interaction details like touch speed, error rate, pressure
and self-reported emotions (happy, sad, stressed, relaxed) during
the study. Our analysis on the collected dataset reveals that the
representation learnt from the interaction pattern has an average
correlation of 0.901 within the same emotion and 0.811 between
different emotions. As a result, the representation is effective
in distinguishing different emotions with an average accuracy
(AUCROC) of 84%.
Index Terms—Representation learning, Emotion detection,
Keyboard interaction, Smartphone interaction
I. I NTRODUCTION
The keyboard interactions on smartphones have been re-
searched as an effective modality for emotion detection [1]–
[9]. However, the underlying patterns are complex enough
that require extensive feature engineering to construct an
accurate emotion prediction model from smartphone keyboard
interactions. Some of the recent works on emotion detection
based on other modalities, such as facial expressions, and
speech characteristics, showed that automatic feature extrac-
tion can be as effective as feature engineering [10], [11].
Hence, applying automatic feature extraction to detect patterns
from smartphone keyboard interactions to build the predictive
models presents itself as a promising approach.
The existing literature indicates that many emotion detection
techniques adopted advanced techniques such as represen-
tation learning, multi-task learning (MTL) motivated by the
success of deep learning in different domains [10]–[13]. For
example, Ghosh et al. [13] applied representation learning to
automatically extract the features from speech and glottal flow
signals. Li et al. [10] proposed an attention pooling based
representation learning mechanism to determine emotion from
speech utterance. They used an end-to-end deep convolutional
neural network (CNN) on the spectrogram extracted from
speech utterances, thus overcoming the requirement of manual
feature extraction. On the other hand, in existing literature,
it is shown that the performance of emotion detection from
acoustic signals improves when valence and arousal are mod-
eled together using MTL [12]. In Emo2Vec [11], Xu et al.
showed that word-level representations obtained using MTL
return superior performance for different emotion related tasks
(e.g. emotion analysis, stress detection) from text data. While
representation learning reduces the feature engineering effort
[14], MTL often returns superior performance by sharing the
training knowledge among different related tasks [15], [16].
However, to the best of our knowledge, no prior work inves-
tigates the effectiveness of these learning models for emotion
detection from keyboard interaction pattern on smartphone.
We, in this paper, propose an end-to-end framework to de-
termine human emotion based on keyboard interaction pattern
leveraging on the aforesaid learning algorithms. It comprises
of two phases. In the first phase, we deploy a sequence-based
encoder using Long Short-Term Memory (LSTM). It automat-
ically learns the representation from raw keyboard interaction
pattern, thus reduces the feature engineering overhead. We
collate all the keyboard interactions in a typing session.
The interaction details within a session like pressure, speed,
duration, key type (deletion, special character, alphanumeric
etc.) are fed as input to the framework to obtain the session-
level representation. In the second phase, we deploy a multi-
task learning (MTL) based deep neural network (DNN) model
for emotion detection using the learnt representation. In MTL,
learning multiple tasks together helps to share knowledge
among similar tasks, thereby often yielding superior perfor-
mance. In our context, emotion detection of an individual user
is a separate task. As a result, the underlying similarity in
keyboard interaction behavior of different users is leveraged
by MTL to improve the emotion detection performance.
We conduct a 3-week in-the-wild study involving 24 par-
ticipants using an Android based custom keyboard, capable of
tracing users’ keyboard interactions. Based on the text entry, a
self-report probing mechanism collects four types of emotions
(happy, sad, stressed, relaxed). We utilize the collected key-
board interactions and self-report details for model construc-
978-1-7281-3888-6/19/$31.00 ©2019 IEEE