© 2016 Takato Horii et al., published by De Gruyter Open. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License. Paladyn, J. Behav. Robot. 2016; 7:40–54 Research Article Open Access Takato Horii*, Yukie Nagai, and Minoru Asada Imitation of human expressions based on emotion estimation by mental simulation DOI 10.1515/pjbr-2016-0004 Received August 1, 2016; accepted December 20, 2016 Abstract: Humans can express their own emotion and es- timate the emotional states of others during communica- tion. This paper proposes a unified model that can esti- mate the emotional states of others and generate emo- tional self-expressions. The proposed model utilizes a multimodal restricted Boltzmann machine (RBM) —a type of stochastic neural network. RBMs can abstract latent in- formation from input signals and reconstruct the signals from it. We use these two characteristics to rectify issues affecting previously proposed emotion models: construct- ing an emotional representation for estimation and gener- ation for emotion instead of heuristic features, and actual- izing mental simulation to infer the emotion of others from their ambiguous signals. Our experimental results showed that the proposed model can extract features representing the distribution of categories of emotion via self-organized learning. Imitation experiments demonstrated that using our model, a robot can generate expressions better than with a direct mapping mechanism when the expressions of others contain emotional inconsistencies. Moreover, our model can improve the estimated belief in the emotional states of others through the generation of imaginary sen- sory signals from defective multimodal signals (i.e., men- tal simulation). These results suggest that these abilities of the proposed model can facilitate emotional human–robot communication in more complex situations. Keywords: emotion, human–robot interaction, deep learning, mental simulation, imitation *Corresponding Author: Takato Horii: Department of Adaptive Machine Systems, Graduate School of Engineering, Osaka Univer- sity, Osaka, Japan, E-mail: takato.horii@ams.eng.osaka-u.ac.jp Yukie Nagai: Department of Adaptive Machine Systems, Graduate School of Engineering, Osaka University, Osaka, Japan, E-mail: yukie@ams.eng.osaka-u.ac.jp Minoru Asada: Department of Adaptive Machine Systems, Gradu- ate School of Engineering, Osaka University, Osaka, Japan, E-mail: asada@ams.eng.osaka-u.ac.jp 1 Introduction Communicating emotion to others is a significant skill in human–human and human–robot interaction. In at- tempts to achieve emotional communication several em- pathic robots have been developed [1–13]. Breazeal et al. [1] presented a creature robot called Leonardo that can im- itate humans’ facial expressions. Leonardo learns the di- rect mapping between a person’s facial expression and its expression by using a neural network. Andra and Robin- son [2] developed an android head robot that mimicked the facial expressions of humans with the aim of social- emotional intervention for autistic children. Their robot tracked facial feature points of subjects who expressed emotional states and directly converted them into corre- sponding control points to modify its own facial expres- sion. However, the direct mapping of human expressions may lead to misalignment of emotional states. For exam- ple, humans may show a tearful face when crying with de- light. Further, their expressions vary depending on con- text. Consequently mapping only facial expression (i.e., crying) can result in miscommunication of the emotional state (i.e., happiness). Therefore, it is better for robot sys- tems to estimate the emotional states of communication partners and generate expressions based on the estimated states. Several empathic robots that consider the internal state of others for their own expressions currently exist [3– 10]. Trovato et al. [3] and Kishi et al. [4] developed an emo- tional model for a humanoid robot, KOBIAN, based on psy- chological studies. Their model represented KOBIAN’s in- ternal state, which is modulated by external stimuli. It also had prototypes of facial expressions grounded on specific emotional states and expressed facial patterns as combi- nations of these prototypes [14]. Further, an anthropomor- phic robot called BARTHOC is capable of recognizing hu- man’ emotion from speech and producing facial expres- sions corresponding to the six basic emotion [5]. Kismet [6, 7] is one of the most popular social robots that have established emotional communication with humans. The Kismet system extracts features corresponding to three af- fective values (specifically arousal, valence, and stance)