IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 49, NO. 8, AUGUST 2002 773
Vibration Parameter Extraction From Endoscopic
Image Series of the Vocal Folds
Michael Döllinger*, Ulrich Hoppe, Member, IEEE, Frank Hettlich, Jörg Lohscheller, Stefan Schuberth, and
Ulrich Eysholdt
Abstract—An approach is given to extract parameters affecting
phonation based upon digital high-speed recordings of vocal fold
vibrations and a biomechanical model. The main parameters
which affect oscillation are vibrating masses, vocal fold tension,
and subglottal air pressure. By combining digital high-speed
observations with the two-mass-model by Ishizaka and Flanagan
(1972) as modified by Steinecke and Herzel (1995), an inversion
procedure has been developed which allows the identification and
quantization of laryngeal asymmetries. The problem is regarded
as an optimization procedure with a nonconvex objective function.
For this kind of problem, the choice of appropriate initial values
is important. This optimization procedure is based on spectral
features of vocal fold movements. The applicability of the inversion
procedure is first demonstrated in simulated vocal fold curves.
Then, inversion results are presented for a healthy voice and a
hoarse voice as a case of functional dysphonia caused by laryngeal
asymmetry.
Index Terms—Digital high-speed glottography, hoarseness, in-
version, optimization, two-mass-model, vocal fold vibration.
I. INTRODUCTION
H
OARSENESS arises from irregular vibrations of the
vocal folds. In most of the cases, these irregularities
are caused by asymmetries between left and right vocal fold
[1]. Most of the laryngeal asymmetries such as unilateral
vocal fold polyps, paralysis, etc. can be observed directly with
standard endoscopes. However, a couple of patients suffer from
dysphonia where no evidences for morphological laryngeal
asymmetries can be found. These cases of functional dysphonia
exhibit asymmetries only during phonation. Usually, these
asymmetries affect only the oscillations and not the evident
anatomical aspects. Therefore, they can only be recognized by
means of digital high-speed recordings [2].
A useful biomechanical model of vocal fold vibrations was
presented by Ishizaka and Flanagan (two-mass-model) [3]. It as-
sumes one vocal fold to be represented by a pair of two coupled
oscillators. A large variety of modified versions based on the
model of Ishizaka and Flanagan have been proposed [4]–[9].
Manuscript received July 30, 2001; revised April 5, 2002. This work was
supported in part by Deutsche Forschungsgemeinschaft under Grant DFG,
EY15/10-1 and in part by Sonderforschungsbereich 603 under Grant SFB 603
(sub project B5). Asterisk indicates corresponding author.
*M. Döllinger is with the Department of Phoniatrics and Pediatric Audiology,
University of Erlangen-Nürnberg, Bohlenplatz 21, D-91054 Erlangen, Germany
(e-mail: boert-rdm@gmx.de).
U. Hoppe, J. Lohscheller, S. Schuberth, and U. Eysholdt are with the Depart-
ment of Phoniatrics and Pediatric Audiology, University of Erlangen-Nürnberg,
D-91054 Erlangen, Germany.
F. Hettlich is with the Institute of Mathematics, University of Karlsruhe,
D-76128 Karlsruhe, Germany.
Publisher Item Identifier 10.1109/TBME.2002.800755.
Fig. 1. Schematic representation of the used 2MM of the vocal folds.
We used the simplified two-mass-model (2MM) by Steinecke
and Herzel [7], [10]. This model is sketched as a frontal section
in Fig. 1 and can be described as follows:
The 2MM consists of two parts describing the myoelastic
and the aerodynamic properties of the vocal folds. In the 2MM,
masses on the left and on the right are set into vibrations by
aerodynamic forces caused by the subglottal pressure which can
be described by the Bernoulli law [11]. The interaction of the
glottal flow with subglottal and supraglottal vocal tract is ne-
glected. This simplification is justified by excised larynx exper-
iments [12]. Additionally, nonlinear parts of the elastic forces
are small enough to be negligible [7]. Moreover, the simpli-
fied model neglects viscous losses inside the glottis and assumes
Bernoulli flow only below the narrowest part of the glottis [7].
According to these simplifications, the dynamics of the
system can be described by a system of eight differential equa-
tions, as shown in (1) at the bottom of the next page, where
denote oscillation amplitudes with regard to the rest position
of the masses and the corresponding velocities. The
indexes ( ) represent lower ( 1) and upper ( 2) masses
and left ( ) and right ( ) masses. The matrix contains
tissue properties of the vocal folds, i.e., masses , stiffness
coefficients , and damping coefficients . Bernoulli force
is denoted as . and represent the impact forces
which act during collision of left and right masses. A detailed
description of the model and the definition of the parameter set
can be found in [3], [7], [10], and [13]–[15].
Though the above-described version of the 2MM requires a
lot of simplifications concerning both the myoelastic and the
0018-9294/02$17.00 © 2002 IEEE