Multidirectional Regression (MDR)-Based Features for Automatic Voice Disorder Detection *Ghulam Muhammad, †,‡Tamer A. Mesallam, †Khalid H. Malki, †Mohamed Farahat, *Awais Mahmood, and *Mansour Alsulaiman, *yRiyadh, Saudi Arabia, zShibin El Kom, Egypt Summary: Background and Objective. Objective assessment of voice pathology has a growing interest nowa- days. Automatic speech/speaker recognition (ASR) systems are commonly deployed in voice pathology detection. The aim of this work was to develop a novel feature extraction method for ASR that incorporates distributions of voiced and unvoiced parts, and voice onset and offset characteristics in a time-frequency domain to detect voice pathology. Materials and Methods. The speech samples of 70 dysphonic patients with six different types of voice disorders and 50 normal subjects were analyzed. The Arabic spoken digits (1–10) were taken as an input. The proposed feature extraction method was embedded into the ASR system with Gaussian mixture model (GMM) classiﬁer to detect voice disorder. Results. Accuracy of 97.48% was obtained in text independent (all digits’ training) case, and over 99% accuracy was obtained in text dependent (separate digit’s training) case. The proposed method outperformed the conventional Mel frequency cepstral coefﬁcient (MFCC) features. Conclusion. The results of this study revealed that incorporating voice onset and offset information leads to efﬁcient automatic voice disordered detection. Key Words: Voice disorders detection–Multidirectional regression–Automatic speech recognition–Arabic digits. INTRODUCTION Assessment or diagnosis of voice disorders relies, besides other parameters, on the correct measurement of voice. There are two types of measurements, which are subjective and objective. Subjective measurement of voice quality is based on individual experience. 1–3 In addition, objective measurement that includes acoustical analysis is independent on human bias and can assess the voice quality more reliably by relating certain parameters to vocal fold behavior. Current practices are therefore shifting toward developing new techniques of acoustic measures to improve the performance of an automatic voice disorder detection (AVDD) system. Many types of acoustic measures are reported in the literature to differentiate between disordered voice and normal voice. These measures can be divided mainly into three groups: temporal, frequency, and cepstral. Temporal features include amplitude perturbation (shimmer) and pitch perturbation (jitter); 4,5 frequency features include mean fundamental frequency, spectrum centroid, standard deviation of frequency, spectrum ﬂatness, etc; 5,6 and cepstral features include cepstral peak prominence (CPP), 7 CPP smoothed (CPPS), 8 etc. In most of the cases, acoustic measures are applied on sus- tained vowel, particularly jɑj vowel. 9 The temporal representa- tion of jɑj vowel shows larger and sharper peaks than other vowels, and this feature is directly correlated to electroglotto- graph parameters. Acoustic measure from sustained vowels includes mainly shimmer, jitter, and harmonic-to-noise ratio (HNR). Jitter and shimmer are perturbations of fundamental frequency and intensity, respectively. HNR quantiﬁes the amount of glottal noise presents in vowel part. Although these three parameters are widely used in the literature of discriminat- ing normal and pathological voice, many studies come out with different relationship. For example, a very poor correlation be- tween jitter parameter and breathiness ratings was found in Martin et al’s study, 10 a medium correlation of 0.55 was re- ported in Eskenazi et al’s study, 11 and a high correlation of 0.86 was found in another study by Shrivastav. 12 Little et al 13 conducted experiments on the vocal fold patients and normal speakers using classical measures such as shimmer, jitter, and HNR, and nonlinear measures such as recurrence period density entropy, detrended ﬂuctuation analysis (DFA), and correlation dimensions. DFA shows the detail changing pattern in breath noise of the voice, while recurrence period density entropy quantiﬁes any ambiguity in pitch. The authors found that non- linear measures were more stable and reliable than the classical measures on dysphonia quantiﬁcation. Several-group delay- based analysis on jɑj vowel was investigated in a study by Drugman et al 14 It was shown that phase information which is the basis of these analyses could provide signiﬁcant informa- tion for disordered voice. The authors proposed complex cepstrum-based decomposition to differentiate between patho- logical voice and normal voice, and achieved 4.08% error rate. Sustained vowel is useful for acoustic analysis in a controlled way; however, it is not an actual representation or way of talking in day-to-day life. Sustained vowel does not have prominent attri- butes such as voice onset and offset, voice breaks, pitch variation, etc. These attributes are equally important for the measurement of voice quality in everyday speech. A few numbers of research works involving running speech compared with sustained vowel have been done so far in voice pathology detection. In Umapathy Accepted for publication May 4, 2012. From the *Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia; yENT Department, College of Medicine, King Saud University, Riyadh, Saudi Arabia; and the zDepartment of Otolaryn- gology, College of Medicine, Al-Menouﬁya University, Shibin El Kom, Egypt. Address correspondence and reprint requests to Ghulam Muhammad, Department of Computer Engineering, College of Computer and Information Sciences, King Saud Uni- versity, Riyadh 11543, Saudi Arabia. E-mail: ghulam@ksu.edu.sa Journal of Voice, Vol. 26, No. 6, pp. 817.e19-817.e27 0892-1997/$36.00 Ó 2012 The Voice Foundation doi:10.1016/j.jvoice.2012.05.002