© 2016 Takato Horii et al., published by De Gruyter Open.
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.
Paladyn, J. Behav. Robot. 2016; 7:40–54
Research Article Open Access
Takato Horii*, Yukie Nagai, and Minoru Asada
Imitation of human expressions based on emotion
estimation by mental simulation
DOI 10.1515/pjbr-2016-0004
Received August 1, 2016; accepted December 20, 2016
Abstract: Humans can express their own emotion and es-
timate the emotional states of others during communica-
tion. This paper proposes a unified model that can esti-
mate the emotional states of others and generate emo-
tional self-expressions. The proposed model utilizes a
multimodal restricted Boltzmann machine (RBM) —a type
of stochastic neural network. RBMs can abstract latent in-
formation from input signals and reconstruct the signals
from it. We use these two characteristics to rectify issues
affecting previously proposed emotion models: construct-
ing an emotional representation for estimation and gener-
ation for emotion instead of heuristic features, and actual-
izing mental simulation to infer the emotion of others from
their ambiguous signals. Our experimental results showed
that the proposed model can extract features representing
the distribution of categories of emotion via self-organized
learning. Imitation experiments demonstrated that using
our model, a robot can generate expressions better than
with a direct mapping mechanism when the expressions
of others contain emotional inconsistencies. Moreover, our
model can improve the estimated belief in the emotional
states of others through the generation of imaginary sen-
sory signals from defective multimodal signals (i.e., men-
tal simulation). These results suggest that these abilities of
the proposed model can facilitate emotional human–robot
communication in more complex situations.
Keywords: emotion, human–robot interaction, deep
learning, mental simulation, imitation
*Corresponding Author: Takato Horii: Department of Adaptive
Machine Systems, Graduate School of Engineering, Osaka Univer-
sity, Osaka, Japan, E-mail: takato.horii@ams.eng.osaka-u.ac.jp
Yukie Nagai: Department of Adaptive Machine Systems, Graduate
School of Engineering, Osaka University, Osaka, Japan, E-mail:
yukie@ams.eng.osaka-u.ac.jp
Minoru Asada: Department of Adaptive Machine Systems, Gradu-
ate School of Engineering, Osaka University, Osaka, Japan, E-mail:
asada@ams.eng.osaka-u.ac.jp
1 Introduction
Communicating emotion to others is a significant skill
in human–human and human–robot interaction. In at-
tempts to achieve emotional communication several em-
pathic robots have been developed [1–13]. Breazeal et al.
[1] presented a creature robot called Leonardo that can im-
itate humans’ facial expressions. Leonardo learns the di-
rect mapping between a person’s facial expression and its
expression by using a neural network. Andra and Robin-
son [2] developed an android head robot that mimicked
the facial expressions of humans with the aim of social-
emotional intervention for autistic children. Their robot
tracked facial feature points of subjects who expressed
emotional states and directly converted them into corre-
sponding control points to modify its own facial expres-
sion. However, the direct mapping of human expressions
may lead to misalignment of emotional states. For exam-
ple, humans may show a tearful face when crying with de-
light. Further, their expressions vary depending on con-
text. Consequently mapping only facial expression (i.e.,
crying) can result in miscommunication of the emotional
state (i.e., happiness). Therefore, it is better for robot sys-
tems to estimate the emotional states of communication
partners and generate expressions based on the estimated
states.
Several empathic robots that consider the internal
state of others for their own expressions currently exist [3–
10]. Trovato et al. [3] and Kishi et al. [4] developed an emo-
tional model for a humanoid robot, KOBIAN, based on psy-
chological studies. Their model represented KOBIAN’s in-
ternal state, which is modulated by external stimuli. It also
had prototypes of facial expressions grounded on specific
emotional states and expressed facial patterns as combi-
nations of these prototypes [14]. Further, an anthropomor-
phic robot called BARTHOC is capable of recognizing hu-
man’ emotion from speech and producing facial expres-
sions corresponding to the six basic emotion [5]. Kismet
[6, 7] is one of the most popular social robots that have
established emotional communication with humans. The
Kismet system extracts features corresponding to three af-
fective values (specifically arousal, valence, and stance)