C. Stephanidis (Ed.): Posters, Part I, HCII 2013, CCIS 373, pp. 337–341, 2013. © Springer-Verlag Berlin Heidelberg 2013 Emotional Speech Conversion Using Pitch-Synchronous Harmonic and Non-harmonic Modeling of Speech Kwang Myung Jeon and Nam In Park School of Information and Communications Gwangju Institute of Science and Technology (GIST) 1 Oryong-dong, Buk-gu, Gwangju 500-712, Korea {kmjeon,naminpark}@gist.ac.kr Abstract. In this paper, an emotional speech conversion method using pitch- synchronous harmonic and non-harmonic (PS-HNH) modeling of speech is proposed. The proposed method converts neutral speeches into expressive ones by controlling emotional parameters for each syllable of the neutral speech. To this end, the proposed method first carries out syllable labeling by Viterbi decoding using acoustic hidden Markov models of the neutral corpus. Next, the PS-HNH analysis is performed on the neutral speech to modify the emotional parameters by the linear modification model of target emotion in a syllable-wise manner. Finally, the modified parameters are synthesized back into the emotional speech by the PS-HNH synthesis. The performance of the proposed method is evaluated by a subjective AB preference test for four types of target emotions (fear, sadness, anger, and happiness). It is shown from the preference test that the proposed method give better speech quality than the conventional method that is based on speech transformation and representation using adaptive interpolation of weighted spectrum (STRAIGHT). Keywords: Emotional speech, speech conversion, pitch-synchronous, harmonic and non-harmonic modeling. 1 Introduction Speech and audio processing are the major tasks to improve sound user interface (SUI)-based human computer interaction (HCI) applications [1]. Various studies have been performed to improve the SUI experience by expressing emotional speech [2][3], reducing unwanted signals [4][5], providing sound-based emotional interaction functionality [6][7], and improving the robustness of speech transmission [8][9]. Among these research fields, the expression of emotional speech by converting neutral speech into emotional speech is particularly important for the SUI to give emotional feedback to users [2][3].Recent studies related to emotional speech conversion techniques have commonly applied speech transformation and representation using adaptive interpolation of weighted spectrum (STRAIGHT) to control emotional parameters such as fundamental frequency (F0), duration, and intensity of neutral speech segments [2][3]. Due to its flexible speech modification