STATISTICAL ANALYSIS OF SPEECH DISORDER SPECIFIC FEATURES TO
CHARACTERISE DYSARTHRIA SEVERITY LEVEL
Amlu Anna Joshy
1,3
, P. N. Parameswaran
1,3
, Siddharth R. Nair
1,3
, Rajeev Rajan
2,3
1
College of Engineering Trivandrum, Thiruvananthapuram,
2
Government Engineering College, Barton Hill,
3
APJ Abdul Kalam Technological University, India.
ABSTRACT
Poor coordination of the speech production subsystems due to
any neurological injury or a neuro-degenerative disease leads
to dysarthria, a neuro-motor speech disorder. Dysarthric
speech impairments can be mapped to the deficits caused
in phonation, articulation, prosody, and glottal functioning.
With the aim of reducing the subjectivity in clinical evalua-
tions, many automated systems are proposed in the literature
to assess the dysarthria severity level using these features.
This work aims to analyse the suitability of these features
in determining the severity level. A detailed investigation is
done to rank these features for their efficacy in modelling the
pathological aspects of dysarthric speech, using the technique
of paraconsistent feature engineering. The study used two
dysarthric speech databases, UA-Speech and TORGO. It puts
light into the fact that both the prosody and articulation fea-
tures are best useful for dysarthria severity estimation, which
was supported by the classification accuracies obtained on
using different machine learning classifiers.
Index Terms— dysarthria severity estimation, paracon-
sistent feature engineering, statistical analysis
1. INTRODUCTION
The speech disorder arising from poor coordination of the
speech production subsystems is referred to as dysarthria.
The speech impairments exhibited by dysarthric patients are
defined in different dimensions such as articulation, phona-
tion, prosody, nasality and intelligibility in literature. Impre-
cise articulations due to the retardation of lip, jaw and tongue
movements, and irregular glottal closure patterns resulting
in breathy voice are top among the most evident dysarthria
symptoms [1], [2]. Phonation features can define the mono-
tonicity and tempo perturbations exhibited by the dysarthric
patients [3]. Dysarthric speech is often emotionless and lacks
rhythm due to the abnormal speech rate and irregular loud-
ness, and the prosodic measures can characterise them [3].
When associated with any degenerative disorders of the cen-
tral nervous system and/or hereditary conditions, dysarthria
can be progressive in nature. This demands the need for fre-
quent monitoring of the severity level for proper medication
and voice treatment during rehabilitation. However, subjec-
tive evaluation of the same by speech-language pathologists
(SLP) would be biased, time-consuming, and expensive. Dif-
ferent approaches for automating this severity estimation are
adopted in the literature. While the earlier works concen-
trated on feature selection [4], [5] and handcrafted feature
generation [6], [7], more recent works focus on building end-
to-end systems or sophisticated deep learning models with
basic acoustic features [8], [9], [10], [11]. However, training
deep learning models is prone to overfitting as the amount of
dysarthric data available is limited. The physical fatigue and
vocal strain faced by the dysarthrics lead to this challenge of
data scarcity.
Our initial experiments using these speech disorder-
specific features on deep neural networks(DNN) [12] sug-
gested that a detailed statistical analysis is required to under-
stand the potential correlation within each class. This would
also enable a choice of the optimum feature descriptor that
could be used by a simple predictor for aiding SLPs. When
end-to-end systems aim to replace the need of an SLP, at
the cost of data gathering requirements and computational
costs, simple predictors such as a random forest(RF) classi-
fier can aid SLPs after selecting an optimum descriptor. We
implement the recently proposed technique of paraconsis-
tent feature engineering(PFE) [13] to picture the intra-class
similarities and the inter-class distinctions exhibited by these
features. PFE is not exactly a statistical tool, but a similar data
analysis tool that helps to draw meaningful conclusions from
the features representing raw data. The descriptive nature of
the statistical analysis is not shown by PFE as it does not un-
cover the structure behind the data. However, the exploratory
nature is present implicitly as it helps in understanding the
potential correlation among the features and the way they are
mapped to the correct severity level. PFE has been shown to
be efficient in feature ranking for applications such as replay
attack detection [14] and speaker verification [15].
The proposed approach is explained in Section 2, fol-
lowed by Section 3 describing the databases. The experimen-
tal framework and result analysis are given in Sections 4 and
5 respectively. Finally, the paper is concluded in Section 6.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 978-1-7281-6327-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICASSP49357.2023.10095366