Vol.:(0123456789) 1 3 International Journal of Speech Technology https://doi.org/10.1007/s10772-023-10046-9 Binary classifer for identifcation of stammering instances in Hindi speech data Shivam Dwivedi 1 · Sanjukta Ghosh 1 · Satyam Dwivedi 1 Received: 3 April 2023 / Accepted: 29 August 2023 © Springer Science+Business Media, LLC, part of Springer Nature 2023 Abstract In this research paper we show results from our experiments on creating a binary classifer for stammering identifcation in Hindi speech data. We train several Sequential CNN models with parametric adjustments such as color, image size, and training data shape changes to tweak classifcation performance. Our experimental pipeline converts speech samples into spectrograms using Librosa, and trains the Sequential CNN classifer on the image data using TensorFlow Lite. Our clas- sifcation models achieve more than 95% accuracy in this classifcation task. Keywords Stammering · Stuttering · Image classifcation · SLP · Hindi 1 Introduction Fluent speech is an important factor in efcient verbal com- munication. Stammering is a fuency related disorder in which a subject faces difculty during articulation of the speech sounds, which in result afects their speech fuency. Be it presentations, business talks or casual conversations, fuency plays a major role in defning the success of the com- munication (Morreale et al., 2000). As per available statistics (Howell, 2011a) 1% of the adult population stammer, and 5% of kids stammer at some point in their childhood. Stam- mering is typically diagnosed in the childhood, while it is found across all age groups. People who stammer (PWS) have to go through negative peer pressure that takes a toll on their mental health and confdence to communicate verbally (Craig & Tran, 2006). With the advances in deep learning, a plethora of unim- aginable problems, and complex tasks are being solved using features extracted from large datasets. Processing of speech- related peculiarities has always been a challenging feat to accomplish as researchers often face a lack of data and resources. Even after Hindi being the 4th largest language (Kachru, 2008) globally by the number of native speakers, there is no computational work for stammering in Hindi apart from a few theoretical works. This leaves Hindi PWS in a very difcult position when they want to use voice-based technology but face fuency related challenges. Currently speech language pathologists (SLP) use Com- puterized Language Analysis (CLAN) for accessing various databases like CHILDES, PhonBank, FluencyBank and The TalkBank. They use PRAAT and speech fling system (SFS) (Howell & Huckvale, 2004) for analyzing, recording and annotating audio samples. None of these tools incorporate AI or Machine Learning techniques that may enable SLPs to leverage technical advances in their day to day workfows. It is also worth mentioning that use of virtual assistants is increasing by leaps and bounds. A recent study on the ever- growing demand of virtual assistants (Sayago et al., 2019) mentions that by the end of 2023 there would be more than 8 billion devices that can act as a virtual assistant. That leaves 1% of world population which stammers (Howell, 2011a) in margins as with current support they will face challenges in including virtual assistants in their life. It is also estimated (Howell, 2011b) that at least 5% of population goes through some kind of stammering in their life time. Hence, we need to solve the problem of making virtual assistants receptive for diverse speech traits. * Shivam Dwivedi admin@shivamdwivedi.com Sanjukta Ghosh sanjukta.hss@iitbhu.ac.in Satyam Dwivedi admin@satyamdwivedi.com 1 Indian Institute of Technology, BHU, Varanasi, India