OBTAINING LIP AND GLOTTAL REFLECTION COEFFICIENTS FROM
VOWEL SOUNDS
Huiqun Deng
1
, Rabab K. Ward
2
, Michael P. Beddoes
2
, Douglas O’Shaughnessy
1
1
INRS-EMT 800 de la Gauchetiére Ouest, bureau 6900, Montréal H5A 1K6 Canada
2
Electrical and Computer Engineering Department, University of British Columbia, BC V6T 1Z4, Canada
ABSTRACT
Knowledge about lip and glottal reflection coefficients during
phonation is needed to eliminate their distortion effects on the
estimates of vocal-tract area functions and glottal waves from
vowel sounds. Direct measurements of these coefficients at
human mouths are difficult. This paper presents a method for
estimating them from vowel sounds. The estimation encounters
an ill-defined inverse problem: the number of unknowns is
greater than the number of constraints, and non-unique solutions
exist for a sound. To overcome this problem, this paper uses a
vowel sound produced by a subject whose vocal-tract area
function (VTAF) for the sound is known. The estimates of the
lip and the glottal reflection coefficients are determined as those
that lead to a VTAF solution most similar to the known VTAF
for the sound. The lip and the glottal reflection coefficients
obtained for /a/ and /i/ are presented.
1. INTRODUCTION
It is known that to obtain accurate estimates of vocal-tract area
functions (VTAFs) and glottal waves from vowel sounds, the
effects of incomplete glottal closures and frequency-dependent
lip reflection coefficients contained in the vocal-tract filter
(VTF) estimates must be eliminated [1], [2]. To do so, the
parameters of the lip and the glottal reflection coefficients must
be known. However, previous knowledge about a glottal
reflection coefficient r
g
and a lip reflection coefficient r
lip
is
based on some simplified models for glottal impedances and lip
radiation impedances. Glottal impedances were derived from a
rectangular slit model, and lip radiation impedances were
approximated as those of spherical sources, or those of pistons in
a sphere or in a baffle [3]. Experiments show that such
simplified models cannot lead to satisfactory results in the
estimation of VTAFs. More accurate knowledge about r
g
and r
lip
can be obtained by directly measuring r
g
and r
lip
at the glottis
and the lip opening of a human speaker. However, such
measurements are dangerous and difficult.
This paper presents a signal processing method for estimating
r
g
and r
lip
from vowel sounds. As shown later, determining r
g
and r
lip
from a vowel sound needs to solve an underdetermined
system of equations about r
g
, r
lip
and the VTAF, and there are
non-unique sets of solutions of r
g
, r
lip
and VTAF for a sound. To
overcome this problem, this paper obtains r
g
and r
lip
from the
vowel sound of a subject whose VTAF for the sound has been
obtained using a MRI (magnetic resonance imaging) method.
The known vocal-tract area function is used as a guide in
determining the estimates of r
g
and r
lip:
the best estimates of r
g
and r
lip
should lead to the VTAF solution that is the most similar
to the known VTAF among those determined from other r
g
and
r
lip
values. In the next section, the acoustic models for VTF
estimates, r
g
, r
lip,
and vowel sound signals are presented. In
section 3, the method for obtaining r
g,
r
lip
, and VTAFs from a
vowel sound signal is developed. Section 4 presents the results
obtained from vowel sounds /a/ and /i/. Section 5 contains
conclusions.
2. MODELS
2.1 Glottal-vocal-tract filters and vocal-tract filters
The acoustic model for producing a vowel sound is shown in
Fig. 1 [4], where u
sc
(t) is the equivalent glottal source signal, Z
g
is the glottal impedance, u
g
(t) is the glottal wave (the total
volume velocity at the back end of the vocal tract), p
1
(t) is the
sound pressure at the back end of the vocal tract, Z
lip
is the lip
radiation impedance, and u
lip
(t) and p
lip
(t) are the total volume
velocity and sound pressure at the lip opening, respectively. The
transfer function from the glottal source to the lip volume
velocity is defined as a glottal-vocal-tract filter (GVTF). The
transfer function from the glottal wave to the volume velocity at
the lip opening is defined as a vocal-tract filter (VTF).
In the discrete-time domain, the vocal tract is modeled as an
acoustic tube with M sections each having the same length and a
different cross-sectional area. If the vocal-tract length L and the
number of sections of the tube model are related as M=2LF
s
/c,
where F
s
is the sampling rate of the sound, c is the speed sound,
then the Z transform of the GVTF transfer function is [5], [6]:
[ ]
,
1 1
...
1
, 1
) 1 ( ) 1 )( 1 ( 5 . 0
) (
) (
) (
1 1 1
1
1
1 1
1
1
1
1
2 /
+ + +
= ≡
- - -
-
-
- -
-
=
-
∏
z r z z r
r
z z r
r
r
r r r z
z U
z U
z H
lip M
M
g
M
m
m lip g
M
sc
lip
GVTF
(1)
where
) / /( ) / (
1 1
S c Z S c Z r
g g g
ρ ρ + - =
(2)
) /( ) (
1 1 m m m m m
S S S S r + - =
+ +
(3)
) / /( ) / (
lip M lip M lip
Z S c Z S c r + - = ρ ρ
(4)
r
1
, …, r
M-1
are the reflection coefficients at each boundary of the
tube model of the vocal tract, S
1
is the cross-sectional area at the
backend of the vocal tract, and S
M
is that near the lips.
The transfer function of a VTF does not contain the effect of
an open glottis, and should be estimated from the sound signal
I 373 142440469X/06/$20.00 ©2006 IEEE ICASSP 2006