Comparison of Two Multiparameter Acoustic Indices of
Dysphonia Severity: The Acoustic Voice Quality Index
and Cepstral Spectral Index of Dysphonia
*Jeong Min Lee, *Nelson Roy, *Elizabeth Peterson, and †Ray M. Merrill, *Salt Lake City and †Provo, Utah
Summary: Objectives. TheAcoustic Voice Quality Index (AVQI) and the Cepstral Spectral Index of Dysphonia
(CSID) are two multiparameter acoustic indices designed to objectively estimate dysphonia severity and track treat-
ment outcomes. This study compared the performance of these two indices using a common corpus of dysphonic speakers.
Method. Pre- and posttreatment samples of sustained vowel and connected speech were elicited from 112 patients
across six diagnostic categories: unilateral vocal fold paralysis (n = 12), adductor spasmodic dysphonia (n = 12), primary
muscle tension dysphonia (n = 12), benign vocal fold lesions (n = 12), presbylaryngis (n = 12), and mutational falsetto
(n = 12). Listener ratings of dysphonia severity were compared to acoustic estimates of severity derived from two it-
erations of the AVQI (versions 2.02 and 3.01) as well as the CSID.
Results. TheAVQI- and CSID-estimated severity for sustained vowels, connected speech, and a combined context
were strongly correlated and significantly associated with listener ratings pretreatment, posttreatment, and change ob-
served pre- to posttreatment. However, multiple regression analysis (adjusted for age, sex, and diagnostic category)
revealed that the CSID generally accounted for more variance in listener-perceived severity ratings, and the contribu-
tion of the AVQI was small and statistically insignificant when the CSID was already in a combined model.
Conclusions. The AVQI and the CSID were strongly correlated and both provided valid estimates of dysphonia se-
verity. However, associations observed between the CSID- and listener-estimated dysphonia were almost uniformly
stronger than either version of theAVQI, suggesting that the CSID outperformed the AVQI.
Key Words: Voice disorders–Cepstral analysis–Dysphonia severity–AVQI–CSID.
INTRODUCTION
Acoustic assessment of voice using cepstral analysis is a valu-
able tool for quantifying dysphonia severity and tracking treatment
outcomes in research and clinical settings.
1–4
The cepstrum is a
Fourier transform of the log power spectrum and may be used
to determine the extent to which the dominant rahmonic (an
anagram of “harmonic” often associated with the vocal funda-
mental frequency) is individualized and emerges out of the
background noise.
5
This has also been referred to as the cepstral
peak prominence (CPP), and numerous studies have demon-
strated that increased dysphonia severity is often associated with
a decrease in the amplitude of the cepstral peak (ie, lower har-
monic energy) and an increase in high-frequency spectral energy.
Furthermore, Hillenbrand and Houde
6
described a method of
computing the normalized CPP by comparing the amplitude of
the cepstral peak with the expected amplitude as determined via
linear regression. A smoothed version of the cepstral peak prom-
inence (CPPS) has been shown recently to be strongly associated
with listener-estimated dysphonia severity.
1,2,5–20
Unlike time-based measures of aperiodicity such as jitter and
shimmer, the CPP does not require a quasi-periodic signal to be
valid and can be derived from both sustained vowel and con-
nected speech samples.
4
The limitations of traditional time-
based analysis combined with the strong performance of cepstral
(as well as spectral-based) acoustic measures have led research-
ers to develop multiparameter algorithms incorporating measures
of the CPP along with other spectral- and time-based acoustic
parameters to optimize the quantification of dysphonia severi-
ty. One such example is the Cepstral Spectral Index of Dysphonia
(CSID), a commercially available index within the Analysis of
Dysphonia in Speech and Voice program (ADSV model 5109;
KayPENTAX, Montvale, NJ).
9–11,16,18,19,21,22
Another such index
is theAcoustic Voice Quality Index (AVQI),
15,23
which is an ap-
plication that operates within Praat, a free-software program.
24
The AVQI is a single estimate of dysphonia severity based
on a weighted algorithm incorporating six acoustic parameters
derived from an analysis of a concatenated sample combining
both sustained vowel and connected speech samples from the
same speaker. Each of the six acoustic measures was identified
previously as uniquely accounting for variance explained in
listener-perceived ratings of dysphonia severity.
15
The AVQI al-
gorithm includes the CPPS, harmonics-to-noise ratio (HNR),
shimmer local (SL, also known as percent shimmer), shimmer
local dB (SLdB, also known as shimmer in dB), as well as the
slope and tilt of the regression line through the long-term average
spectrum (SLOPE dB and TILT dB). The analysis script, when
incorporated into Praat, automatically estimates an AVQI score,
which ranges from 0 to 10, with increasing values reflecting a
continuum of severity from normal to profoundly abnormal voice.
Unlike other indices, the concatenation of both voice contexts
Accepted for publication June 20, 2017.
Conflict of interest: The authors have no conflicts of interest to disclose.
Disclosure: The authors have no financial relationships relevant to this article to
disclose.
From the *Department of Communication Sciences and Disorders, The University of
Utah, Salt Lake City, Utah 84112; and the †Department of Health Science, BrighamYoung
University, Provo, Utah 84602.
Address correspondence and reprint requests to Jeong Min Lee, Department of
Communication Sciences and Disorders, The University of Utah, 390 South 1530 East, Suite
1201, BEH SCI, Salt Lake City, UT 84112. E-mail: jeongmin.lee@utah.edu
Journal ofVoice, Vol. ■■, No. ■■, pp. ■■-■■
0892-1997
© 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jvoice.2017.06.012
ARTICLE IN PRESS