Assamese Numeral Speech Recognition using Multiple Features and Cooperative LVQ - Architectures Manash Pratim Sarma and Kandarpa Kumar Sarma, Member, IEEE Abstract—A set of Artiﬁcial Neural Network (ANN) based meth- ods for the design of an effective system of speech recognition of numerals of Assamese language captured under varied recording conditions and moods is presented here. The work is related to the formulation of several ANN models conﬁgured to use Linear Predictive Code (LPC), Principal Component Analysis (PCA) and other features to tackle mood and gender variations uttering numbers as part of an Automatic Speech Recognition (ASR) system in Assamese. The ANN models are designed using a combination of Self Organizing Map (SOM) and Multi Layer Perceptron (MLP) constituting a Learning Vector Quantization (LVQ) block trained in a cooperative environment to handle male and female speech samples of numerals of Assamese- a language spoken by a sizable population in the North-Eastern part of India. The work provides a comparative evaluation of several such combinations while subjected to handle speech samples with gender based differences captured by a micro- phone in four different conditions viz. noiseless, noise mixed, stressed and stress-free. Keywords—Assamese, Recognition, LPC, Spectral, ANN. I. I NTRODUCTION Speech recognition is a method that uses an audio input for data entry to a computer or a digital system allowing it to derive some meaning out of it. Speech contains emotions and feelings and is generated by precisely coordinated muscle actions in the head, neck, chest, and abdomen. Speech results after a gradual process involving years of learning and practice [1]. Speech analysis and synthesis is one of the most thrust areas of research with applications to communication systems, process control, automation etc. Various related applications are also possible: speech enhancement, speech synthesis, speech coding, storage, retrieval etc. Speech corpus can be generated by extracting carefully chosen features from the speech signal [1]. Feature extraction involves transforming the input data into the set of values that best describes the input under consideration [2]. This work focuses on the design of a Speech Recognition System to handle numerals of Assamese language. The work considers the the extraction of Linear Predictive Code (LPC) and Principal Component Analysis (PCA) features of the captured speech samples recorded under varied conditions with gender and mood variations. A hybrid feature set is formed using LPC and PCA features. Further, the spectrum of the captured speeches are also considered as features. These extracted feature types are applied to classiﬁers Manash Pratim Sarma and Kandarpa Kumar Sarma are with the Depart- ment of Electronics and Communication Technology, Gauhati University, Guwahati - 781014, Assam, India. e-mail: (manashpelsc@gmail.com and kandarpaks@gmail.com) formed using Learning Vector Quantization (LVQ) blocks. The LVQ - block is formed by using a combination of Self Orga- nizing Map (SOM) and Multi Layer Perceptron (MLP). LVQ blocks are further arranged in a cooperative architecture to minimize the classiﬁcation error and maximize the prediction rate. The system also been tested for speaker - dependant and speaker - independent cases. The success-rates vary but the cooperative architectures provide better results at the cost of increased times required for training. Several work have been reported with regards to LVQ based speech recognition. Yet research has continued in area and more and more works are being reported which have explored innovative means of tackling speech recognition and related aspects. Some of the relevant works can be enumerated as below: 1) An extension of a self-organizing map called self- organizing multilayer perceptron (SOMLP) whose pur- pose is to achieve quantization of spaces of functions has been presented in B. Gas [3]. Possible use of the commonly used vector quantization algorithms (LVQ algorithms) to build new algorithms called functional quantization algorithms (LFQ algorithms) has also been demonstrated. The SOMLP algorithm allows quanti- zation of function with high dimensional input space and as a consequence, classical FDA methods can be outperformed by increasing the dimensionality of the input space of the functions under analysis. 2) A novel Learning Vector Quantization (LVQ) based speech recognition method with the use of MFCC (Mel Frequency Cepstral Coefﬁcient) and Differential MFCC has been presented in [4]. The work used a normal Ko- honen LVQ network and then an improved LVQ scheme and simulation result has been reported. The improved LVQ network is based on two nearest neurons (winning and next winning) by virtue of which it is capable of classifying two nearest input classiﬁer vectors. 3) A Hidden Markov Model (HMM) and LVQ based recognition scheme has been proposed in [5]. The work introduced MFCC, ∇ MFCC and ∇∇ MFCC extraction algorithms are introduced, then these coefﬁcients are normalized by HMM-based Viterbi method and then the resulted feature set is used to make learn coarsely by the ﬁrst LVQ network and then ﬁnely by the improved LVQ network. 4) An interactive and incremental learning algorithm based World Academy of Science, Engineering and Technology International Journal of Electronics and Communication Engineering Vol:5, No:9, 2011 1258 International Scholarly and Scientific Research & Innovation 5(9) 2011 scholar.waset.org/1307-6892/12746 International Science Index, Electronics and Communication Engineering Vol:5, No:9, 2011 waset.org/Publication/12746