Vol.:(0123456789) 1 3
International Journal of Speech Technology
https://doi.org/10.1007/s10772-023-10046-9
Binary classifer for identifcation of stammering instances in Hindi
speech data
Shivam Dwivedi
1
· Sanjukta Ghosh
1
· Satyam Dwivedi
1
Received: 3 April 2023 / Accepted: 29 August 2023
© Springer Science+Business Media, LLC, part of Springer Nature 2023
Abstract
In this research paper we show results from our experiments on creating a binary classifer for stammering identifcation
in Hindi speech data. We train several Sequential CNN models with parametric adjustments such as color, image size, and
training data shape changes to tweak classifcation performance. Our experimental pipeline converts speech samples into
spectrograms using Librosa, and trains the Sequential CNN classifer on the image data using TensorFlow Lite. Our clas-
sifcation models achieve more than 95% accuracy in this classifcation task.
Keywords Stammering · Stuttering · Image classifcation · SLP · Hindi
1 Introduction
Fluent speech is an important factor in efcient verbal com-
munication. Stammering is a fuency related disorder in
which a subject faces difculty during articulation of the
speech sounds, which in result afects their speech fuency.
Be it presentations, business talks or casual conversations,
fuency plays a major role in defning the success of the com-
munication (Morreale et al., 2000). As per available statistics
(Howell, 2011a) 1% of the adult population stammer, and
5% of kids stammer at some point in their childhood. Stam-
mering is typically diagnosed in the childhood, while it is
found across all age groups. People who stammer (PWS)
have to go through negative peer pressure that takes a toll on
their mental health and confdence to communicate verbally
(Craig & Tran, 2006).
With the advances in deep learning, a plethora of unim-
aginable problems, and complex tasks are being solved using
features extracted from large datasets. Processing of speech-
related peculiarities has always been a challenging feat to
accomplish as researchers often face a lack of data and
resources. Even after Hindi being the 4th largest language
(Kachru, 2008) globally by the number of native speakers,
there is no computational work for stammering in Hindi
apart from a few theoretical works. This leaves Hindi PWS
in a very difcult position when they want to use voice-based
technology but face fuency related challenges.
Currently speech language pathologists (SLP) use Com-
puterized Language Analysis (CLAN) for accessing various
databases like CHILDES, PhonBank, FluencyBank and The
TalkBank. They use PRAAT and speech fling system (SFS)
(Howell & Huckvale, 2004) for analyzing, recording and
annotating audio samples. None of these tools incorporate
AI or Machine Learning techniques that may enable SLPs
to leverage technical advances in their day to day workfows.
It is also worth mentioning that use of virtual assistants is
increasing by leaps and bounds. A recent study on the ever-
growing demand of virtual assistants (Sayago et al., 2019)
mentions that by the end of 2023 there would be more than 8
billion devices that can act as a virtual assistant. That leaves
1% of world population which stammers (Howell, 2011a) in
margins as with current support they will face challenges in
including virtual assistants in their life. It is also estimated
(Howell, 2011b) that at least 5% of population goes through
some kind of stammering in their life time. Hence, we need
to solve the problem of making virtual assistants receptive
for diverse speech traits.
* Shivam Dwivedi
admin@shivamdwivedi.com
Sanjukta Ghosh
sanjukta.hss@iitbhu.ac.in
Satyam Dwivedi
admin@satyamdwivedi.com
1
Indian Institute of Technology, BHU, Varanasi, India