96 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 47, NO. 1, JANUARY 2000 Adaptive Estimation of Residue Signal for Voice Pathology Diagnosis Marcelo de Oliveira Rosa*, José Carlos Pereira, and Marcos Grellet Abstract—The use of noninvasive techniques to evaluate the larynx and vocal tract helps the speech specialists to perform accurate diagnose of diseases. In this study, a method to distin- guish among 21 different pathologies using speech signals was developed. Through inverse filtering (Kalman and Wiener filters) of the voice signal, the residue was estimated and seven acoustic features were extracted from it to evaluate the laryngeal diseases. As time-invariant inverse filtering was used, the nonstationary nature of dysphonic voices was also considered. Together with the estimation of the acoustic features using a robust statistical method, this technique also allowed us to discriminate among pathologies with very close perceptual characteristics. The results from a Mann–Whitney test indicated that the best measurement for pathological discrimination was JITTER with 54.79% ability to cluster the voice types and the worst one was spectral flatness of residue (SFR) with 36.41%. Index Terms—Acoustic measurements, adaptive filtering, laryn- geal pathologies, voice signal analysis, residual signal. I. INTRODUCTION T HE PATHOLOGICAL diagnosis of the vocal tract is a field which still demands further investigation due to the difficulty to standardize the diagnose of the speech pathologists. The currently available tools, such as indirect laryngoscopy, videolaryngoscopy, stroboscope light, and the professional’s ear, require subjective identification of problems in the larynx and vocal folds, resulting in a qualitative assessment of these structures. In addition, some speakers may present a reflex action in the supraglottal cavity when the above instruments are used, producing wrong assessments. Due to its quantitative and noninvasive nature, acoustic anal- ysis of the human voice represents an important tool for clini- cians in the prediagnosis of larynx diseases, and also allows that organic growth located on the posterior part of the vocal folds be identified. The basis of such a method is the extraction of features (or measurements) from speech signals and their corre- lation with disease characteristics or pathologies themselves. Studies related to acoustic analysis are mainly based on the periodicity of vocal fold vibration and on the volume of air that escapes through the glottis during speech. The periodicity per- turbations are made up of measurements of jitter (variation be- Manuscript received November 12, 1997; revised June 24, 1999. Asterisk in- dicates corresponding author. *M. de Oliveira Rosa is with the School of Engineering of São Carlos, University of São Paulo, 13560-250 São Carlos, São Paulo, Brazil (e-mail: marceloI@sel.eesc.sc.usp.br). J. C. Pereira is with the School of Engineering of São Carlos, University of São Paulo, 13560-250 São Carlos, São Paulo, Brazil. M. Grellet is with the Faculty of Medicine of Ribeirão Preto, University of São Paulo, 13560-250 São Carlos, São Paulo, Brazil. Publisher Item Identifier S 0018-9294(00)00243-3. tween successive fundamental periods) and shimmer (variation between successive magnitudes in the fundamental periods). Researchers defined acoustic measurements and related them to pathologies and acoustic perception of breathiness, hoarseness, and harshness [1]–[10]. Using correlation coefficients of magni- tude measurements of fundamental frequency (or pitch), Koike [11] and Iwata [12] defined a voice pattern for normal speakers and patients with unilateral paralysis and carcinoma. Pinto and Titze [12] tried to unify the measurements of jitter and shimmer by analysis of high-order differences using the median and stan- dard deviation. These statistical measurements have presented satisfactory pathological classification of voice signals. On the other hand, Wendhal [14], [15] synthesized jitter and shimmer in order to represent a harsh voice for comparison with a dys- phonic voice. The turbulence in glottal flow resulting from malfunction of the vocal folds can be quantified by the noise in spectral com- ponents of speech. Diseases like polyps, cysts, and Reinke’s edema reduce the energy of the harmonic structure of the spec- trum and increase that of the nonharmonic structure ones. This distortion is related to the disease extension and several spectral measurements have been proposed in order to assess it. The rela- tionship between harmonic components and the remaining ones [16]–[22], the difference among the low-frequency harmonics [23], the amount of spectral energy starting from a specific fre- quency [24], and the analysis of the least spectral magnitude at specific frequency intervals [25]–[28] are some of the spectral measurements proposed in the literature. Using another approach, some authors have analyzed the residues of inverse filtering of the speech signal [29]–[32] which correspond to an estimate of the excitation signal from a mathematical model of the vocal tract. Some features from signal residues are extracted to distinguish between normal speakers and dysphonic patients. The use of the residue signal instead of glottal pulse estimation is due to the former brings information like abnormal movement of vocal folds and turbulence noise [30]. The present work follows the latter approach incorporating robust adaptive techniques which allow a better estimate of the residue signal over all samples (approximately 5 s of sustained speech) even in nonstationary conditions which occur for invol- untary movements of the supraglottal apparatus. The combined use of Wiener’s and Kalman’s filters allows an optimum discrimination of diseases with similar acoustic char- acteristics. The arrangement of these algorithms in order to de- terminate acoustic measurements along the three Brazilian Por- tuguese phonemes (/a/, /e/, and /i/) results in the discrimination (distinction) of 21 different pathologies of the larynx. 0018–9294/00$10.00 © 2000 IEEE