IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 49, NO. 8, AUGUST 2002 773 Vibration Parameter Extraction From Endoscopic Image Series of the Vocal Folds Michael Döllinger*, Ulrich Hoppe, Member, IEEE, Frank Hettlich, Jörg Lohscheller, Stefan Schuberth, and Ulrich Eysholdt Abstract—An approach is given to extract parameters affecting phonation based upon digital high-speed recordings of vocal fold vibrations and a biomechanical model. The main parameters which affect oscillation are vibrating masses, vocal fold tension, and subglottal air pressure. By combining digital high-speed observations with the two-mass-model by Ishizaka and Flanagan (1972) as modified by Steinecke and Herzel (1995), an inversion procedure has been developed which allows the identification and quantization of laryngeal asymmetries. The problem is regarded as an optimization procedure with a nonconvex objective function. For this kind of problem, the choice of appropriate initial values is important. This optimization procedure is based on spectral features of vocal fold movements. The applicability of the inversion procedure is first demonstrated in simulated vocal fold curves. Then, inversion results are presented for a healthy voice and a hoarse voice as a case of functional dysphonia caused by laryngeal asymmetry. Index Terms—Digital high-speed glottography, hoarseness, in- version, optimization, two-mass-model, vocal fold vibration. I. INTRODUCTION H OARSENESS arises from irregular vibrations of the vocal folds. In most of the cases, these irregularities are caused by asymmetries between left and right vocal fold [1]. Most of the laryngeal asymmetries such as unilateral vocal fold polyps, paralysis, etc. can be observed directly with standard endoscopes. However, a couple of patients suffer from dysphonia where no evidences for morphological laryngeal asymmetries can be found. These cases of functional dysphonia exhibit asymmetries only during phonation. Usually, these asymmetries affect only the oscillations and not the evident anatomical aspects. Therefore, they can only be recognized by means of digital high-speed recordings [2]. A useful biomechanical model of vocal fold vibrations was presented by Ishizaka and Flanagan (two-mass-model) [3]. It as- sumes one vocal fold to be represented by a pair of two coupled oscillators. A large variety of modified versions based on the model of Ishizaka and Flanagan have been proposed [4]–[9]. Manuscript received July 30, 2001; revised April 5, 2002. This work was supported in part by Deutsche Forschungsgemeinschaft under Grant DFG, EY15/10-1 and in part by Sonderforschungsbereich 603 under Grant SFB 603 (sub project B5). Asterisk indicates corresponding author. *M. Döllinger is with the Department of Phoniatrics and Pediatric Audiology, University of Erlangen-Nürnberg, Bohlenplatz 21, D-91054 Erlangen, Germany (e-mail: boert-rdm@gmx.de). U. Hoppe, J. Lohscheller, S. Schuberth, and U. Eysholdt are with the Depart- ment of Phoniatrics and Pediatric Audiology, University of Erlangen-Nürnberg, D-91054 Erlangen, Germany. F. Hettlich is with the Institute of Mathematics, University of Karlsruhe, D-76128 Karlsruhe, Germany. Publisher Item Identifier 10.1109/TBME.2002.800755. Fig. 1. Schematic representation of the used 2MM of the vocal folds. We used the simplified two-mass-model (2MM) by Steinecke and Herzel [7], [10]. This model is sketched as a frontal section in Fig. 1 and can be described as follows: The 2MM consists of two parts describing the myoelastic and the aerodynamic properties of the vocal folds. In the 2MM, masses on the left and on the right are set into vibrations by aerodynamic forces caused by the subglottal pressure which can be described by the Bernoulli law [11]. The interaction of the glottal flow with subglottal and supraglottal vocal tract is ne- glected. This simplification is justified by excised larynx exper- iments [12]. Additionally, nonlinear parts of the elastic forces are small enough to be negligible [7]. Moreover, the simpli- fied model neglects viscous losses inside the glottis and assumes Bernoulli flow only below the narrowest part of the glottis [7]. According to these simplifications, the dynamics of the system can be described by a system of eight differential equa- tions, as shown in (1) at the bottom of the next page, where denote oscillation amplitudes with regard to the rest position of the masses and the corresponding velocities. The indexes ( ) represent lower ( 1) and upper ( 2) masses and left ( ) and right ( ) masses. The matrix contains tissue properties of the vocal folds, i.e., masses , stiffness coefficients , and damping coefficients . Bernoulli force is denoted as . and represent the impact forces which act during collision of left and right masses. A detailed description of the model and the definition of the parameter set can be found in [3], [7], [10], and [13]–[15]. Though the above-described version of the 2MM requires a lot of simplifications concerning both the myoelastic and the 0018-9294/02$17.00 © 2002 IEEE