CONSTRUCTION AND ANALYSIS OF INDONESIAN EMOTIONAL SPEECH CORPUS Nurul Lubis 1,2 , Dessi Lestari 1 , Ayu Purwarianti 1 , Sakriani Sakti 2 , Satoshi Nakamura 2 1 School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia 2 Graduate School of Information Science, Nara Institute of Science and Technology, Japan 13510012@std.stei.itb.ac.id, {dessipuji,ayu}@stei.itb.ac.id, {ssakti,s-nakamura}@is.naist.jp ABSTRACT In this paper we present Indonesian Emotional Speech Cor- pus (IDESC), the first ever corpus in Bahasa Indonesia that contains various emotion contents. As interaction between human and computer makes its way to the most natural form possible, it becomes more and more urgent to incorporate emotion in the equation. However, in Bahasa Indonesia, this aspect is yet to be explored. The acquisition of an emotion corpus serves as a foundation in further research regarding the subject. In constructing IDESC, we aim at natural and real emotion that is applicable to human-computer interac- tion. The corpus consists of three episodes of Indonesian talk show in different genres: politics, humanity, and entertain- ment. Each episode is carefully segmented and labeled based on its emotion content, resulting in 1357 segments worth 1 hour, 1 minute, and 43 seconds of speech. The corpus is still in its early stage of development, yielding exciting possibili- ties of future works. Index TermsBahasa Indonesia, corpus, emotion, speech 1. INTRODUCTION Emotion is an aspect yet to be fully replicated that is able to provide richer and more natural interaction between hu- man and computer. Over the years, this issue continues to be addressed. This results in the development in the field of affective computing, through the construction of complex and emotionally advanced systems such as Sensitive Artificial Listener [1], personable in-car assistant [2], and even system that helps with emotional memory [3]. However, the majority of the advancements in affective computing are in English. A number of emotional challenges have been held from year to year to address various issues in the field. In 2009, INTERSPEECH tried to bridge the gaps between excellent research on human emotion recognition from speech and low compatibility of results [4]. They continued to address affec- tive issue in 2010 through one of their sub-challenges [5]. In 2011, Audio Visual Emotion Challenge (AVEC) was held for the first time, aiming at multimedia processing and machine learning methods for automatic emotion analysis [6]. After that, AVEC 2012 tried to analyze emotion from its dimen- sions rather than identifying it as discrete states [7]. In Asian languages, exists a number of studies and find- ings in affective computing. in Chinese, researchers have studied the effect of switching stimulus in user, involving af- fective system [8]. In Tagalog, an automated narrative sto- ryteller was constructed with avarage precision of 86.75% in expressing a particular emotion [9]. Unfortunately, in Bahasa Indonesia, research on topics alike is almost non-existent– even the resource to conduct studies and research on is still very lacking. This is the reason we initiate the construction of Indone- sian corpus for human-computer interaction. However, data collection for this purpose is difficult as they have to mimic real emotion of system user. Most speech emotion corpus is collected through acting. While this provides prominent emo- tion content, the nature of the emotion does not match that of user’s in their interaction with computer. In details, we construct the speech corpus in Bahasa In- donesia from various talk shows, containing real conversa- tions and real emotions. The construction aims at realistic emotion corpus that is applicable to human-computer inter- action. We gather speech data from television broadcasts, segment and annotate them according to the emotion content. Each process in the construction of IDESC is done carefully manually through human recognition. After the construction, we perform analysis on the resulting corpus. The remainder of this paper is organized as follows. Sec- tion 2 describes previous studies and works on emotion cor- pora. Section 3 explains the construction of IDESC. In section 4, we perform analysis on the constructed corpus. Section 5 concludes the paper with closing remarks and future work. 2. RELATED WORK In terms of content, emotion corpora have been constructed from various sources and data collection methods. The em- ployment of actors is popular in earlier construction to pro- vide data with prominent emotion state. Researchers then shift to naturalistic data as it’s potentially more relevant for affective computing. 17