Recognition and Analysis of Emotion in Indonesian Conversational Speech Nurul Lubis 1,2 , Sakriani Sakti 1 , Graham Neubig 1 , Tomoki Toda 1 , Dessi Lestari 2 , Ayu Purwarianti 2 , and Satoshi Nakamura 1 1 Nara Institute of Science and Technology 2 Institut Teknologi Bandung Abstract: The importance of incorporating emotional aspect in human computer interaction continues to arise. Unfortunately, exploration of the subject in Indonesian is still very lacking. This paper presents the first study of emotion recognition in Indonesian on conversational speech. We construct our corpus, IDESC, by making use of television talk show recordings in various topics of discussion, yielding colorful emotional utterances. Using the corpus, we then build a support vector machine (SVM) that classifies Indonesian speech in terms of emotion based on its acoustic features. We perform feature selection and parameter optimization while building the classifier to optimize the recognition performance, resulting in absolute 11.9% increase of accuracy. Lastly, we perform analyses on our corpus and evaluation result to gain better insight of emotion occurence in Indonesian speech. 1 Introduction Human computer interaction technologies aim at the most natural form of interaction possible by match- ing that of human and human. In that sense, the interaction should not only focus on completion of certain tasks, but also engagement with user on an emotional level. This requires a set of capabilities in a machine; recognizing, interpreting, processing, and simulating human affects. To examine different sides and angles of this prob- lem a number of emotional challenges have been held from year to year, e.g INTERSPEECH [1] [2] and AVEC [3] [4]. Along with research and studies in this field, interaction between human and computer have advanced to better facilitate the emotional aspect of the interaction. For Asian languages, there exist a number of stud- ies and findings related to emotion in computing. For Chinese, researchers have studied the effect of switch- ing the stimulus in user, involving affective systems [5]. In Tagalog, an automated narrative storyteller was constructed with an average precision of 86.75% in expressing a particular emotion [6]. Unfortunately, in Indonesian, research on emotion recognition is non- existent—even the resource to conduct studies and research on is still very lacking. This paper presents the first study of emotion recog- nition in Indonesian. We construct a speech corpus from television talk show recordings in various topics of discussion, yielding colorful emotional utterances. Utilizing the corpus, we then train and evaluate our emotion classifier. We observe and analyze the train- ing and evaluation process to have better insight of emotion in Indonesian speech. 2 Previous Works One of the early studies on speech based emotion recognition is performed on acted utterances in the English language [7]. The study reports a novel ap- proach for classifying speech based on its emotion content and the promising acoustic features for the task. More recently, real-time recognition have been constructed using acted emotion corpus intended for teaching autistic children about simple and complex emotions [8]. Emotion recognition has been applied in spoken di- alogue systems to deliver a more natural experience to user. This includes studies on emotion triggers on hu- man spoken dialogue [9] and generation of emotionally coloured conversation [10]. In this context, sponta- neous, naturalistic, or induced emotional speech data