EFFECT OF NOISE AND MODEL COMPLEXITY ON DETECTION OF AMYOTROPHIC LATERAL SCLEROSIS AND PARKINSON’S DISEASE USING PITCH AND MFCC Tanuka Bhattacharjee ⋆ , Jhansi Mallela ⋆ , Yamini Belur † , Nalini Atchayaram ‡ , Ravi Yadav ‡ , Pradeep Reddy ‡ , Dipanjan Gope ⋆⋆ , Prasanta Kumar Ghosh ⋆ ⋆ EE Department, ⋆⋆ ECE Department, Indian Institute of Science, Bengaluru 560012, India † Department of Speech Pathology and Audiology , ‡ Department of Neurology, National Institute of Mental Health and Neurosciences, Bengaluru 560029, India ABSTRACT Dysarthria due to Amyotrophic Lateral Sclerosis (ALS) and Parkinson’s disease (PD) impacts both articulation and prosody in an individual’s speech. Complex deep neural networks exploit these cues for detection of ALS and PD. These are typically done using recordings in laboratory condition. This study aims to examine the robustness of these cues against background noise and model complexity, which has not been investigated before. We perform classiﬁcation experiments with pitch and Mel-frequency cepstral co- efﬁcients (MFCC) using models of three different complexities and additive white Gaussian noise in four signal-to-noise-ratio (SNR) conditions. The ﬁndings are as follows: 1) In clean condition, pitch performs similar to MFCC across most model complexities considered, suggesting that one-dimensional pitch pattern provides discriminative cues for the classiﬁcation to an extent equal to that of multi-dimensional MFCC, 2) Similar trend is observed in noisy cases when classiﬁers are trained and tested in matched noise and SNR conditions, 3) When the classiﬁers trained on clean data are applied in noisy cases, pitch based average classiﬁcation accura- cies are found to be 20.09% and 24.73% higher than those using MFCC for ALS vs. healthy and PD vs. healthy, respectively, sug- gesting robustness of pitch based classiﬁer against noise and model complexity. Index Terms— Amyotrophic Lateral Sclerosis, Parkinson’s dis- ease, Pitch, Mel-frequency cepstral coefﬁcients, Model complexity, Noise. 1. INTRODUCTION Amyotrophic Lateral Sclerosis (ALS) [1] and Parkinson’s disease (PD) [2] are incurable neuro-degenerative disorders which affect muscle movements. Early detection is critical in both cases for timely commencement of therapeutic measures which can prolong the life expectancy of the patients and enhance the quality of their lives. Unfortunately, there exists no single blood or laboratory test that can conﬁrm ALS or PD. Diagnosis is done based on subjective assessment of symptoms and medical histories, along with various neurological and physical examinations. Thus the process is highly time expensive. Diagnosis of ALS based on El Escorial criteria [3] requires a median diagnosis time of 14 months [4], where the average life expectancy of these patients is only 2-5 years from the time of disease onset [5]. Moreover, the clinicians’ subjectivity and perception being involved, the diagnosis process may be susceptible to various sources of errors and biases. Thus, accurate automated diagnostic tool is a need of the hour. Dysarthria is experienced by almost all individuals suffering from ALS and PD, with it being the early sign of ALS in about 30% of the patients [6, 7]. Different aspects of speech functions in- cluding articulation, respiration, phonation and prosody are reported to get affected in these diseases [8, 9]. Various cues descriptive of these speech components have been studied in the literature for classifying healthy controls (HC) and patients with ALS/PD. Deep neural network (DNN) based classiﬁers can exploit the information present in these cues to perform the classiﬁcation with high degree of accuracy. Mel frequency cepstral coefﬁcients (MFCC), represen- tative of spectral characteristics and articulation, has been widely used for this purpose [10, 11, 12]. Suhas et al. [10] employed dense neural network to perform the classiﬁcation, whereas Mal- lela et al. [11] explored 1D-convolutional neural network (CNN) and long short term memory (LSTM) based classiﬁer using transfer learning approach. Log Mel spectrograms have been found to per- form better than MFCC in the context of 2D-CNN based automatic classiﬁcation and severity prediction of ALS and PD [13]. Cepstral separation difference (CSD) indicative of phonation characteris- tics and spectral dynamics together with fundamental frequency variation as markers of respiration and prosody have been used in [12] for PD classiﬁcation and its severity prediction. The authors employed random forest classiﬁer in this work. Vashkevich et al. [14] have proposed novel features based on analysis of the envelope and formant structures of vowels for automatic diagnosis of bulbar ALS. In a recent work, Mallela et al. [15] have achieved very high classiﬁcation performance by directly using raw speech waveform in a CNN-Bidirectional LSTM based framework. Although the DNN based algorithms described above are re- ported to achieve high degree of classiﬁcation accuracy, these mod- els are very expensive in terms of both run-time and memory require- ment. Hence powerful computing resources are crucial for evaluat- ing these models, which imposes restrictions on the deployment of such models in practice. Low complexity classiﬁcation models suit- able for running on-device in mobile phones or general purpose com- puters might be more appropriate in order for it to be useful to the majority of the population. Behaviour of different speech cues under the constraint of low complexity classiﬁers is not well analyzed yet. Experiments related to the existing classiﬁcation methods have been mostly carried out on clean speech recorded in controlled and noise-free laboratory or hospital environments. However, presence of background noise in the speech data is inevitable while deploy- ing these systems in practical scenarios like home-based monitoring. Noise often buries or alters the distinctive information present in the signal, thereby leading to mis-classiﬁcation which may prove to be fatal in cases. Robustness of the speech cues against different vari-