C. Stephanidis (Ed.): Posters, Part I, HCII 2013, CCIS 373, pp. 337–341, 2013.
© Springer-Verlag Berlin Heidelberg 2013
Emotional Speech Conversion Using Pitch-Synchronous
Harmonic and Non-harmonic Modeling of Speech
Kwang Myung Jeon and Nam In Park
School of Information and Communications
Gwangju Institute of Science and Technology (GIST)
1 Oryong-dong, Buk-gu, Gwangju 500-712, Korea
{kmjeon,naminpark}@gist.ac.kr
Abstract. In this paper, an emotional speech conversion method using pitch-
synchronous harmonic and non-harmonic (PS-HNH) modeling of speech is
proposed. The proposed method converts neutral speeches into expressive ones
by controlling emotional parameters for each syllable of the neutral speech. To
this end, the proposed method first carries out syllable labeling by Viterbi
decoding using acoustic hidden Markov models of the neutral corpus. Next, the
PS-HNH analysis is performed on the neutral speech to modify the emotional
parameters by the linear modification model of target emotion in a syllable-wise
manner. Finally, the modified parameters are synthesized back into the
emotional speech by the PS-HNH synthesis. The performance of the proposed
method is evaluated by a subjective AB preference test for four types of target
emotions (fear, sadness, anger, and happiness). It is shown from the preference
test that the proposed method give better speech quality than the conventional
method that is based on speech transformation and representation using
adaptive interpolation of weighted spectrum (STRAIGHT).
Keywords: Emotional speech, speech conversion, pitch-synchronous, harmonic
and non-harmonic modeling.
1 Introduction
Speech and audio processing are the major tasks to improve sound user interface
(SUI)-based human computer interaction (HCI) applications [1]. Various studies have
been performed to improve the SUI experience by expressing emotional speech
[2][3], reducing unwanted signals [4][5], providing sound-based emotional interaction
functionality [6][7], and improving the robustness of speech transmission [8][9].
Among these research fields, the expression of emotional speech by converting
neutral speech into emotional speech is particularly important for the SUI to give
emotional feedback to users [2][3].Recent studies related to emotional speech
conversion techniques have commonly applied speech transformation and
representation using adaptive interpolation of weighted spectrum (STRAIGHT) to
control emotional parameters such as fundamental frequency (F0), duration, and
intensity of neutral speech segments [2][3]. Due to its flexible speech modification