Analysis of Glottal Stop in Assam Sora Language Sishir Kalita 1 , Luke Horo 2 , Priyankoo Sarmah 2 , S.R.M. Prasanna 1 , S. Dandapat 1 1 Department of Electronics and Electrical Engineering 2 Department of Humanities and Social Sciences Indian Institute of Technology Guwahati, Guwahati-781039, India (sishir, luke, priyankoo, prasanna, samaren)@iitg.ernet.in Abstract The objective of this work is to characterize the intervocalic glottal stops in Assam Sora. Assam Sora is a low resource lan- guage of the South Munda language family. Glottal stops are produced with gestures in the deep laryngeal level; hence, the estimated excitation source signal is used in this study to charac- terize the source dynamics during the production of Assam Sora glottal stops. From that, temporal domain voice source features, Quasi-Open Quotient (QOQ) and Normalized Amplitude Quo- tient (NAQ) are extracted along with spectral features such as H1-H2 ratio and Harmonic Richness Factor (HRF). One exci- tation source feature is extracted from the zero frequency ﬁl- tered version of the speech signal to characterize the variations within the glottal cycles in glottal stop region. A recently pro- posed wavelet based voice source feature, Maxima Dispersion Quotient (MDQ) is also used to characterize the abrupt glottal closure during glottal stop production. From the analysis, it is observed that the features are salient enough to uniquely char- acterize glottal stops from the adjacent vowel sounds and may also be used in continuous speech. A Mann-Whitney U test conﬁrmed the statistical signiﬁcance of the differences between glottal stops and their adjacent vowels. Index Terms: Assam Sora Language, glottal stop, zero fre- quency ﬁlter, maxima dispersion quotient. 1. Introduction Speech sounds are produced with gestures in the sub or supra- laryngeal level. However, some sound units are articulated only in the larynx without any effective gestures in the vocal tract. A glottal stop, deﬁned as a stop made by the glottis, is an example of such sound that is produced by ﬁrmly adducting the vocal folds [1]. In the glottal continuum model [2], the glottal stop is considered to be an extreme form of glottal closure and is placed at the right edge of the continuum, while voiceless sounds are placed at the extreme left edge. However, it is suggested that a complete glottal stop is rare in continuous speech [3]. Apart from producing glottal stops as a phonological unit in a language, they can be produced as compensatory articula- tions. For example, Cleft Lip and Palate (CLP) patients pro- duce glottal stops as compensatory articulations [4], while in English, glottal stop occurs as an allophone of the stop con- sonant /t/. At the same time, many Austro-Asiatic languages of the Mon-Khmer subfamily such as Khmer, Chong, Kammu, Car, Khasi, Pnar, Katu, Dannu, Mon, Bunong, Sedang and Kui as well as of the Munda subfamily such as Santali, Mundari, KeraP, Ho, Korku, Juang, Kharia, Sora, Gorum, Remo, Gutob and GtaP [5] [6] include glottal stops in their phoneme inven- tories. However, analysis and characterization of glottal stops with the help of a spectrogram is difﬁcult [7] [8]. As there is no movement of the supralaryngeal articulator, information re- garding articulation in the larynx cannot be obtained. Hence, a signiﬁcant voice (excitation) source analysis is needed to char- acterize this sound unit. Production of glottal stops has a variety of realizations ranging from a complete stop to a laryngealized realization. Similarly, acoustic characteristics of a glottal stop differs sig- niﬁcantly depending on the context in which it occurs [9]. For instance, while it is suggested that intervocalically a dip in the pitch and amplitude contour are reliable cues for perceiving a glottal stop [10], it is argued that irregularity and aperiodic- ity of estimated source signals may also serve as dependable cues of identifying a glottal stop in the same region [7] [8] [11]. Moreover, in the production of a glottal stop, as the larynx pri- marily has an effective gesture and vocal fold vibration is sig- niﬁcantly deviated from the adjacent voiced region, analysis of glottal stop may also be conducted using aerodynamic parame- ters, EGG signals and estimated voice source from speech sig- nals. However, it is preferable to analyze glottal stops directly from the speech signal so that estimated voice source may pro- vide a better way of characterizing the acoustic qualities of a glottal stop. A few attempts have been made to automatically character- ize a glottal stop using speech signal processing. These studies have mostly used excitation source information to characterize a glottal stop. In one such study, the irregularity during a glot- tal stop region using a normalized cross correlation between two adjacent glottal cycles is quantiﬁed in a linear prediction residual of the speech signal [8]. Also, in order to detect the glottal stop in continuous speech of Amharic, normalized jit- ter and logarithm peak normalized excitation strength (LPNES) at each glottal closure instant (GCI) is computed [7]. Addi- tionally, pitch synchronous integrated linear prediction resid- ual is also used as voice source representation to characterize the glottal stop in intervocalic context [11]. This has helped in capturing the variation in the abruptness of glottal pulses using the ratio between the strength of excitation (SoE) at two con- secutive epoch locations and temporal energy distribution using waveform peak factor (WPF). The asymmetric behavior of each glottal cycle is extracted using higher order statistical (HOS) measures. The current study proposes a characterization method for intervocalic glottal stops in a South Munda language called As- sam Sora, spoken by approximately 5000 people in Assam of North East India. Assam Sora has emerged due to the migra- tion of Sora speakers from Orissa to Assam in the 19 th cen- tury. While the presence of a glottal stop in Sora has been re- ported [12], its presence in Assam Sora is also observed [13]. Copyright © 2016 ISCA INTERSPEECH 2016 September 8–12, 2016, San Francisco, USA http://dx.doi.org/10.21437/Interspeech.2016-877 1049