Comparison of Two Multiparameter Acoustic Indices of Dysphonia Severity: The Acoustic Voice Quality Index and Cepstral Spectral Index of Dysphonia *Jeong Min Lee, *Nelson Roy, *Elizabeth Peterson, and Ray M. Merrill, *Salt Lake City and Provo, Utah Summary: Objectives. TheAcoustic Voice Quality Index (AVQI) and the Cepstral Spectral Index of Dysphonia (CSID) are two multiparameter acoustic indices designed to objectively estimate dysphonia severity and track treat- ment outcomes. This study compared the performance of these two indices using a common corpus of dysphonic speakers. Method. Pre- and posttreatment samples of sustained vowel and connected speech were elicited from 112 patients across six diagnostic categories: unilateral vocal fold paralysis (n = 12), adductor spasmodic dysphonia (n = 12), primary muscle tension dysphonia (n = 12), benign vocal fold lesions (n = 12), presbylaryngis (n = 12), and mutational falsetto (n = 12). Listener ratings of dysphonia severity were compared to acoustic estimates of severity derived from two it- erations of the AVQI (versions 2.02 and 3.01) as well as the CSID. Results. TheAVQI- and CSID-estimated severity for sustained vowels, connected speech, and a combined context were strongly correlated and significantly associated with listener ratings pretreatment, posttreatment, and change ob- served pre- to posttreatment. However, multiple regression analysis (adjusted for age, sex, and diagnostic category) revealed that the CSID generally accounted for more variance in listener-perceived severity ratings, and the contribu- tion of the AVQI was small and statistically insignificant when the CSID was already in a combined model. Conclusions. The AVQI and the CSID were strongly correlated and both provided valid estimates of dysphonia se- verity. However, associations observed between the CSID- and listener-estimated dysphonia were almost uniformly stronger than either version of theAVQI, suggesting that the CSID outperformed the AVQI. Key Words: Voice disorders–Cepstral analysis–Dysphonia severity–AVQI–CSID. INTRODUCTION Acoustic assessment of voice using cepstral analysis is a valu- able tool for quantifying dysphonia severity and tracking treatment outcomes in research and clinical settings. 1–4 The cepstrum is a Fourier transform of the log power spectrum and may be used to determine the extent to which the dominant rahmonic (an anagram of “harmonic” often associated with the vocal funda- mental frequency) is individualized and emerges out of the background noise. 5 This has also been referred to as the cepstral peak prominence (CPP), and numerous studies have demon- strated that increased dysphonia severity is often associated with a decrease in the amplitude of the cepstral peak (ie, lower har- monic energy) and an increase in high-frequency spectral energy. Furthermore, Hillenbrand and Houde 6 described a method of computing the normalized CPP by comparing the amplitude of the cepstral peak with the expected amplitude as determined via linear regression. A smoothed version of the cepstral peak prom- inence (CPPS) has been shown recently to be strongly associated with listener-estimated dysphonia severity. 1,2,5–20 Unlike time-based measures of aperiodicity such as jitter and shimmer, the CPP does not require a quasi-periodic signal to be valid and can be derived from both sustained vowel and con- nected speech samples. 4 The limitations of traditional time- based analysis combined with the strong performance of cepstral (as well as spectral-based) acoustic measures have led research- ers to develop multiparameter algorithms incorporating measures of the CPP along with other spectral- and time-based acoustic parameters to optimize the quantification of dysphonia severi- ty. One such example is the Cepstral Spectral Index of Dysphonia (CSID), a commercially available index within the Analysis of Dysphonia in Speech and Voice program (ADSV model 5109; KayPENTAX, Montvale, NJ). 9–11,16,18,19,21,22 Another such index is theAcoustic Voice Quality Index (AVQI), 15,23 which is an ap- plication that operates within Praat, a free-software program. 24 The AVQI is a single estimate of dysphonia severity based on a weighted algorithm incorporating six acoustic parameters derived from an analysis of a concatenated sample combining both sustained vowel and connected speech samples from the same speaker. Each of the six acoustic measures was identified previously as uniquely accounting for variance explained in listener-perceived ratings of dysphonia severity. 15 The AVQI al- gorithm includes the CPPS, harmonics-to-noise ratio (HNR), shimmer local (SL, also known as percent shimmer), shimmer local dB (SLdB, also known as shimmer in dB), as well as the slope and tilt of the regression line through the long-term average spectrum (SLOPE dB and TILT dB). The analysis script, when incorporated into Praat, automatically estimates an AVQI score, which ranges from 0 to 10, with increasing values reflecting a continuum of severity from normal to profoundly abnormal voice. Unlike other indices, the concatenation of both voice contexts Accepted for publication June 20, 2017. Conflict of interest: The authors have no conflicts of interest to disclose. Disclosure: The authors have no financial relationships relevant to this article to disclose. From the *Department of Communication Sciences and Disorders, The University of Utah, Salt Lake City, Utah 84112; and the †Department of Health Science, BrighamYoung University, Provo, Utah 84602. Address correspondence and reprint requests to Jeong Min Lee, Department of Communication Sciences and Disorders, The University of Utah, 390 South 1530 East, Suite 1201, BEH SCI, Salt Lake City, UT 84112. E-mail: jeongmin.lee@utah.edu Journal ofVoice, Vol. ■■, No. ■■, pp. ■■-■■ 0892-1997 © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jvoice.2017.06.012 ARTICLE IN PRESS