www.tjprc.org editor@tjprc.org
International Journal of Computer Science Engineering
and Information Technology Research (IJCSEITR)
ISSN(P): 2249-6831; ISSN(E): 2249-7943
Vol. 4, Issue 2, Apr 2014, 285-290
© TJPRC Pvt. Ltd.
SPEECH RECOGNITION OF HINDI PHENOMES USING MFCC AND BHATTACHARRYA
HISTOGRAM DISTANCE
SANDEEP KAUR
1
, MEENAKSHI SHARMA
2
& SUKHBEER SINGH
3
1
M.Tech Student, Department of CSE, Sri Sai College of Engineering, Pathankot, Punjab, India
2
Department of Head, Sri Sai College of Engineering, Pathankot, Punjab, India
3
Assistant Professor, Sri Sai College of Engineering, Pathankot, Punjab, India
ABSTRACT
This paper describes an algorithm that takes advantage of the distance measures for finding similarity between the
histogram profiles of the feature matrix made of audio signals (Hindi Phenomes). The results obtained with Swaranjali for
tests conducted on a vocabulary of Hindi digits of different speaker. Many researchers have used the root mean square
(rms), log spectral distance, cepstral distance, likelihood ratio (minimum residual principle or delta coding (DELCO)
algorithm), and a cosh measure (based upon two non symmetrical likelihood ratios), however feature matrix profile based
measure was not used, which has distinct advantage when it comes finding similar features for voice profile recognition.
Bhattacharyya histogram is used to measure the distance between the histogram profiles of the feature matrix made of
audio signals (Hindi Phenomes).
KEYWORDS: MFCC, K-Means Algorithm, Framing, Windowing, Hamming Window, Fast Fourier Transform,
Mel-Scaled Filter Bank, Bhattacharyya Coefficient, ROC Curve
INTRODUCTION
Speech recognition is the process by which a algorithm identifies spoken words. Basically, it means talking to
your algorithms and having it correctly recognize what one is saying in simple words. However, the basic terms for
understanding the basic are: Utterance, Speaker Dependence, Vocabularies. The term phoneme was reportedly first used by
A. Dufriche-Desgenettes in 1873, but it referred only to a speech sound. The term phoneme as an abstraction was
developed by the Polish linguist Jan Niecislaw Baudouin de Courtenay and his student Mikolaj Kruszewski during
1875–1895[1]. The term used by these two was fonema, the basic unit of what they called psychophonetics. The concept of
the phoneme was then elaborated in the works of Nikolai Trubetzkoi and others of the Prague School (during the years
1926–1935), and in those of structuralists like Ferdinand de Saussure, Edward Sapir, and Leonard Bloomfield.
Some structuralists (though not Sapir) rejected the idea of a cognitive or psycholinguistic function for the phoneme[2][3].
Units of Speech
A phoneme is a basic unit of a language's phonology, which is combined with other phonemes to form meaningful
units such as words or morphemes. The phoneme can be described as "he smallest contrastive linguistic unit which may
bring about a change of meaning". [6] In this way the difference in meaning between the English words kill and kiss is a
result of the exchange of the phoneme /l/ for the phoneme /s/. Two words that differ in meaning through a contrast of a
single phoneme are called minimal pairs. Some linguists (such as Roman Jakobson and Morris Halle) proposed that
phonemes may be further decomposable into features, such features being the true minimal constituents of language.[7]