EVALUATION OF METHODS FOR PARAMETERIC FORMANT TRANSFORMATION IN VOICE CONVERSION Emir Turajlic Dimitrios Rentzos Saeed Vaseghi Ching-Hsiang Ho* Department of Electronics and Computer Engineering Brunel University, Middlesex UB8 3PH, UK *Fortune Institute of Technology, Kaohsiung, Taiwan, 842, R.O.C. (Emir.turajlic, Dimitrios.Rentzos, Saeed.vaseghi)@brunel.ac.uk, ch.ho@center.fjtc.edu.tw ABSTRACT This paper explores methods of estimation and mapping of parametric formant-based models for voice transformation. The main focus is the transformation of the parameters of a model of the vocal tract of a source speaker to a target speaker. The vocal tract parameters are represented with the linear prediction (LP) model coefficients and the associated formant frequencies, bandwidths, intensities and their temporal trajectories. Two methods are explored for vocal tract (formant) mapping. The first method is based on non- uniform frequency warping and the second is based on pole rotation. Both methods transform all parameters of the formants (frequency, bandwidth and intensity). In addition, the factors that affect the selection of the warping ratios for the mapping functions are presented. Experimental evaluation of voice morphing based on parametric models are presented. 1. INTRODUCTION Voice conversion has applications in all voice output systems such as text to speech synthesis, voice editing, Karaoke, broadcasting and Internet voice applications. An effective voice conversion system would need two essential components: (a) accurate models of the source and the target speakers’ voice characteristics and (b) an effective signal processing method for mapping the source speaker’s voice to the target speaker’s voice. There are two broad approaches to voice conversion: (a) non-parametric mapping of the spectral vectors of a source speaker to those of a target speaker using a source-to-target spectral codebook [3,4], and (b) parametric (LPC) model-based methods [5] of mapping through the modification of the source model parameters towards the estimates of the target model parameters. Parametric modelling allows a more flexible and selective modification of spectral parameters of the vocal tract and also allows modification of the glottal and prosodic parameters. In this paper we consider some of the practical issues in the parametric modelling and mapping of formants in the context of voice conversion. This paper is organised as follows. In section 2 formant estimation is described. Section 3 compares two different methods of parametric mapping of formant features. Section 4 describes experimental results and section 5 concludes the paper. 2. FORMANT TRAJECTORY ESTIMATION To perform a formant-based spectrum mapping, an accurate formant model estimation is needed to deal with the problems of the variability of the number of formants across the phonemes and the merging and de-merging of neighbouring formants (such as F2 and F3) over time. The problems can be alleviated by using a hidden Markov model (HMM) based formant estimation procedure [1,6], where an LP-analysis is performed on speech, and the poles of the LP model are converted into candidate formant features. The pole features are frequency, bandwidth, delta frequency, delta bandwidth and intensity. A 2-D HMM with N left–to-right states distributed across frequency, and M states distributed across time, is used to classify the formant observations as shown in figure 2. Once the formant models for phonemes are available they are used to estimate the formant trajectories as illustrated in figure 1 and described in this paper. Model Estimation LP Model Formant Trajectory Source Speech Target Speech LP Model Formant Trajectory Mapped Speech Warping Factors Target Speaker HMM Model Source Speaker HMM Model Formant Tracking Formant Mapping Speech Recon struction Speech Reconstruction L P C - S p e c t r u m W a r p i n g Figure 1: Spectrum Mapping Procedure I - 724 0-7803-7663-3/03/$17.00 ©2003 IEEE ICASSP 2003