EVALUATION OF METHODS FOR PARAMETERIC FORMANT
TRANSFORMATION IN VOICE CONVERSION
Emir Turajlic Dimitrios Rentzos Saeed Vaseghi Ching-Hsiang Ho*
Department of Electronics and Computer Engineering Brunel University, Middlesex UB8 3PH, UK
*Fortune Institute of Technology, Kaohsiung, Taiwan, 842, R.O.C.
(Emir.turajlic, Dimitrios.Rentzos, Saeed.vaseghi)@brunel.ac.uk, ch.ho@center.fjtc.edu.tw
ABSTRACT
This paper explores methods of estimation and mapping of
parametric formant-based models for voice transformation.
The main focus is the transformation of the parameters of a
model of the vocal tract of a source speaker to a target
speaker. The vocal tract parameters are represented with the
linear prediction (LP) model coefficients and the associated
formant frequencies, bandwidths, intensities and their
temporal trajectories. Two methods are explored for vocal
tract (formant) mapping. The first method is based on non-
uniform frequency warping and the second is based on pole
rotation. Both methods transform all parameters of the
formants (frequency, bandwidth and intensity). In addition,
the factors that affect the selection of the warping ratios for
the mapping functions are presented. Experimental
evaluation of voice morphing based on parametric models
are presented.
1. INTRODUCTION
Voice conversion has applications in all voice output
systems such as text to speech synthesis, voice editing,
Karaoke, broadcasting and Internet voice applications.
An effective voice conversion system would need two
essential components: (a) accurate models of the source and
the target speakers’ voice characteristics and (b) an
effective signal processing method for mapping the source
speaker’s voice to the target speaker’s voice. There are two
broad approaches to voice conversion: (a) non-parametric
mapping of the spectral vectors of a source speaker to those
of a target speaker using a source-to-target spectral
codebook [3,4], and (b) parametric (LPC) model-based
methods [5] of mapping through the modification of the
source model parameters towards the estimates of the target
model parameters. Parametric modelling allows a more
flexible and selective modification of spectral parameters of
the vocal tract and also allows modification of the glottal
and prosodic parameters. In this paper we consider some of
the practical issues in the parametric modelling and
mapping of formants in the context of voice conversion.
This paper is organised as follows. In section 2 formant
estimation is described. Section 3 compares two different
methods of parametric mapping of formant features.
Section 4 describes experimental results and section 5
concludes the paper.
2. FORMANT TRAJECTORY ESTIMATION
To perform a formant-based spectrum mapping, an accurate
formant model estimation is needed to deal with the
problems of the variability of the number of formants
across the phonemes and the merging and de-merging of
neighbouring formants (such as F2 and F3) over time. The
problems can be alleviated by using a hidden Markov
model (HMM) based formant estimation procedure [1,6],
where an LP-analysis is performed on speech, and the poles
of the LP model are converted into candidate formant
features. The pole features are frequency, bandwidth, delta
frequency, delta bandwidth and intensity. A 2-D HMM
with N left–to-right states distributed across frequency, and
M states distributed across time, is used to classify the
formant observations as shown in figure 2.
Once the formant models for phonemes are available they
are used to estimate the formant trajectories as illustrated in
figure 1 and described in this paper.
Model
Estimation
LP
Model
Formant
Trajectory
Source
Speech
Target
Speech
LP
Model
Formant
Trajectory
Mapped
Speech
Warping
Factors
Target
Speaker
HMM
Model
Source
Speaker
HMM
Model
Formant
Tracking
Formant Mapping
Speech
Recon
struction
Speech
Reconstruction
L
P
C
-
S
p
e
c
t
r
u
m
W
a
r
p
i
n
g
Figure 1: Spectrum Mapping Procedure
I - 724 0-7803-7663-3/03/$17.00 ©2003 IEEE ICASSP 2003
➠ ➡