96 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 47, NO. 1, JANUARY 2000
Adaptive Estimation of Residue Signal for Voice
Pathology Diagnosis
Marcelo de Oliveira Rosa*, José Carlos Pereira, and Marcos Grellet
Abstract—The use of noninvasive techniques to evaluate the
larynx and vocal tract helps the speech specialists to perform
accurate diagnose of diseases. In this study, a method to distin-
guish among 21 different pathologies using speech signals was
developed. Through inverse filtering (Kalman and Wiener filters)
of the voice signal, the residue was estimated and seven acoustic
features were extracted from it to evaluate the laryngeal diseases.
As time-invariant inverse filtering was used, the nonstationary
nature of dysphonic voices was also considered. Together with
the estimation of the acoustic features using a robust statistical
method, this technique also allowed us to discriminate among
pathologies with very close perceptual characteristics. The results
from a Mann–Whitney test indicated that the best measurement
for pathological discrimination was JITTER with 54.79% ability
to cluster the voice types and the worst one was spectral flatness
of residue (SFR) with 36.41%.
Index Terms—Acoustic measurements, adaptive filtering, laryn-
geal pathologies, voice signal analysis, residual signal.
I. INTRODUCTION
T
HE PATHOLOGICAL diagnosis of the vocal tract is a
field which still demands further investigation due to the
difficulty to standardize the diagnose of the speech pathologists.
The currently available tools, such as indirect laryngoscopy,
videolaryngoscopy, stroboscope light, and the professional’s
ear, require subjective identification of problems in the larynx
and vocal folds, resulting in a qualitative assessment of these
structures. In addition, some speakers may present a reflex
action in the supraglottal cavity when the above instruments
are used, producing wrong assessments.
Due to its quantitative and noninvasive nature, acoustic anal-
ysis of the human voice represents an important tool for clini-
cians in the prediagnosis of larynx diseases, and also allows that
organic growth located on the posterior part of the vocal folds
be identified. The basis of such a method is the extraction of
features (or measurements) from speech signals and their corre-
lation with disease characteristics or pathologies themselves.
Studies related to acoustic analysis are mainly based on the
periodicity of vocal fold vibration and on the volume of air that
escapes through the glottis during speech. The periodicity per-
turbations are made up of measurements of jitter (variation be-
Manuscript received November 12, 1997; revised June 24, 1999. Asterisk in-
dicates corresponding author.
*M. de Oliveira Rosa is with the School of Engineering of São Carlos,
University of São Paulo, 13560-250 São Carlos, São Paulo, Brazil (e-mail:
marceloI@sel.eesc.sc.usp.br).
J. C. Pereira is with the School of Engineering of São Carlos, University of
São Paulo, 13560-250 São Carlos, São Paulo, Brazil.
M. Grellet is with the Faculty of Medicine of Ribeirão Preto, University of
São Paulo, 13560-250 São Carlos, São Paulo, Brazil.
Publisher Item Identifier S 0018-9294(00)00243-3.
tween successive fundamental periods) and shimmer (variation
between successive magnitudes in the fundamental periods).
Researchers defined acoustic measurements and related them to
pathologies and acoustic perception of breathiness, hoarseness,
and harshness [1]–[10]. Using correlation coefficients of magni-
tude measurements of fundamental frequency (or pitch), Koike
[11] and Iwata [12] defined a voice pattern for normal speakers
and patients with unilateral paralysis and carcinoma. Pinto and
Titze [12] tried to unify the measurements of jitter and shimmer
by analysis of high-order differences using the median and stan-
dard deviation. These statistical measurements have presented
satisfactory pathological classification of voice signals. On the
other hand, Wendhal [14], [15] synthesized jitter and shimmer
in order to represent a harsh voice for comparison with a dys-
phonic voice.
The turbulence in glottal flow resulting from malfunction of
the vocal folds can be quantified by the noise in spectral com-
ponents of speech. Diseases like polyps, cysts, and Reinke’s
edema reduce the energy of the harmonic structure of the spec-
trum and increase that of the nonharmonic structure ones. This
distortion is related to the disease extension and several spectral
measurements have been proposed in order to assess it. The rela-
tionship between harmonic components and the remaining ones
[16]–[22], the difference among the low-frequency harmonics
[23], the amount of spectral energy starting from a specific fre-
quency [24], and the analysis of the least spectral magnitude at
specific frequency intervals [25]–[28] are some of the spectral
measurements proposed in the literature.
Using another approach, some authors have analyzed the
residues of inverse filtering of the speech signal [29]–[32]
which correspond to an estimate of the excitation signal from
a mathematical model of the vocal tract. Some features from
signal residues are extracted to distinguish between normal
speakers and dysphonic patients. The use of the residue signal
instead of glottal pulse estimation is due to the former brings
information like abnormal movement of vocal folds and
turbulence noise [30].
The present work follows the latter approach incorporating
robust adaptive techniques which allow a better estimate of the
residue signal over all samples (approximately 5 s of sustained
speech) even in nonstationary conditions which occur for invol-
untary movements of the supraglottal apparatus.
The combined use of Wiener’s and Kalman’s filters allows an
optimum discrimination of diseases with similar acoustic char-
acteristics. The arrangement of these algorithms in order to de-
terminate acoustic measurements along the three Brazilian Por-
tuguese phonemes (/a/, /e/, and /i/) results in the discrimination
(distinction) of 21 different pathologies of the larynx.
0018–9294/00$10.00 © 2000 IEEE