OBTAINING LIP AND GLOTTAL REFLECTION COEFFICIENTS FROM VOWEL SOUNDS Huiqun Deng 1 , Rabab K. Ward 2 , Michael P. Beddoes 2 , Douglas O’Shaughnessy 1 1 INRS-EMT 800 de la Gauchetiére Ouest, bureau 6900, Montréal H5A 1K6 Canada 2 Electrical and Computer Engineering Department, University of British Columbia, BC V6T 1Z4, Canada ABSTRACT Knowledge about lip and glottal reflection coefficients during phonation is needed to eliminate their distortion effects on the estimates of vocal-tract area functions and glottal waves from vowel sounds. Direct measurements of these coefficients at human mouths are difficult. This paper presents a method for estimating them from vowel sounds. The estimation encounters an ill-defined inverse problem: the number of unknowns is greater than the number of constraints, and non-unique solutions exist for a sound. To overcome this problem, this paper uses a vowel sound produced by a subject whose vocal-tract area function (VTAF) for the sound is known. The estimates of the lip and the glottal reflection coefficients are determined as those that lead to a VTAF solution most similar to the known VTAF for the sound. The lip and the glottal reflection coefficients obtained for /a/ and /i/ are presented. 1. INTRODUCTION It is known that to obtain accurate estimates of vocal-tract area functions (VTAFs) and glottal waves from vowel sounds, the effects of incomplete glottal closures and frequency-dependent lip reflection coefficients contained in the vocal-tract filter (VTF) estimates must be eliminated [1], [2]. To do so, the parameters of the lip and the glottal reflection coefficients must be known. However, previous knowledge about a glottal reflection coefficient r g and a lip reflection coefficient r lip is based on some simplified models for glottal impedances and lip radiation impedances. Glottal impedances were derived from a rectangular slit model, and lip radiation impedances were approximated as those of spherical sources, or those of pistons in a sphere or in a baffle [3]. Experiments show that such simplified models cannot lead to satisfactory results in the estimation of VTAFs. More accurate knowledge about r g and r lip can be obtained by directly measuring r g and r lip at the glottis and the lip opening of a human speaker. However, such measurements are dangerous and difficult. This paper presents a signal processing method for estimating r g and r lip from vowel sounds. As shown later, determining r g and r lip from a vowel sound needs to solve an underdetermined system of equations about r g , r lip and the VTAF, and there are non-unique sets of solutions of r g , r lip and VTAF for a sound. To overcome this problem, this paper obtains r g and r lip from the vowel sound of a subject whose VTAF for the sound has been obtained using a MRI (magnetic resonance imaging) method. The known vocal-tract area function is used as a guide in determining the estimates of r g and r lip: the best estimates of r g and r lip should lead to the VTAF solution that is the most similar to the known VTAF among those determined from other r g and r lip values. In the next section, the acoustic models for VTF estimates, r g , r lip, and vowel sound signals are presented. In section 3, the method for obtaining r g, r lip , and VTAFs from a vowel sound signal is developed. Section 4 presents the results obtained from vowel sounds /a/ and /i/. Section 5 contains conclusions. 2. MODELS 2.1 Glottal-vocal-tract filters and vocal-tract filters The acoustic model for producing a vowel sound is shown in Fig. 1 [4], where u sc (t) is the equivalent glottal source signal, Z g is the glottal impedance, u g (t) is the glottal wave (the total volume velocity at the back end of the vocal tract), p 1 (t) is the sound pressure at the back end of the vocal tract, Z lip is the lip radiation impedance, and u lip (t) and p lip (t) are the total volume velocity and sound pressure at the lip opening, respectively. The transfer function from the glottal source to the lip volume velocity is defined as a glottal-vocal-tract filter (GVTF). The transfer function from the glottal wave to the volume velocity at the lip opening is defined as a vocal-tract filter (VTF). In the discrete-time domain, the vocal tract is modeled as an acoustic tube with M sections each having the same length and a different cross-sectional area. If the vocal-tract length L and the number of sections of the tube model are related as M=2LF s /c, where F s is the sampling rate of the sound, c is the speed sound, then the Z transform of the GVTF transfer function is [5], [6]: [ ] , 1 1 ... 1 , 1 ) 1 ( ) 1 )( 1 ( 5 . 0 ) ( ) ( ) ( 1 1 1 1 1 1 1 1 1 1 1 2 /                   + + + = ≡ - - - - - - - - = - ∏ z r z z r r z z r r r r r r z z U z U z H lip M M g M m m lip g M sc lip GVTF (1) where ) / /( ) / ( 1 1 S c Z S c Z r g g g ρ ρ + - = (2) ) /( ) ( 1 1 m m m m m S S S S r + - = + + (3) ) / /( ) / ( lip M lip M lip Z S c Z S c r + - = ρ ρ (4) r 1 , …, r M-1 are the reflection coefficients at each boundary of the tube model of the vocal tract, S 1 is the cross-sectional area at the backend of the vocal tract, and S M is that near the lips. The transfer function of a VTF does not contain the effect of an open glottis, and should be estimated from the sound signal I  373 142440469X/06/$20.00 ©2006 IEEE ICASSP 2006