A Study of Speech Emotion and Speaker Identification System using VQ and GMM Sushma Bahuguna 1 , Y. P. Raiwani 2 1 BCIIT (Affiliated to GGSIPU) New Delhi, India 2 CSE, HNB Garhwal University, Srinagar, Uttarakhand, India Abstract This paper describes a text independent, closed set, speaker identification system to identify the speaker along with the emotional expression (Emo-voice Model) of the particular speech sentence. The system is evaluated on recorded sample sentences of native Hindi speakers in five basic emotions. Spectral Features, Mel Frequency cepstral coefficients have been used to implement emo-voice models using Vector Quantization and Gaussian Mixture modeling techniques for selected sample sentences using MATLAB. The VQ model trained with K-mean algorithm achieves as much as 82.7% of speaker identification with correct emotion accuracy whilst GMM model trained with EM algorithm achieves 87.9% of speaker identification with correct emotion accuracy. The statistical approach of Emo-voice Models could be used to extend the application field of voiceprint recognition technology. Keywords: Emo-voice model, EM, GMM, K-mean, VQ 1. Introduction The present work explores a real time text-independent, closed-set, speaker identification system comparing a speech signal from an unknown speaker to a database of known speakers to classify speaker and speaker’s emotion on the basis of individual information included in speech waves . It is an application based on behavioral and physiological characteristics of the speaker’s voice in which unique features of speech are analyzed to identify the speaker speaking in different emotions. Features extracted from the converted digital symbols are stored as character template of the person which is stored in computer database and speaker emotion classification is processed inside the Identification System. The system operates in Training mode and Identification mode. In training mode we make feature model of the voice and using the information of training mode we isolate and identify the speaker in the identification mode. Figure 1 depicts System overview of the Emo-voice model. The input speech passes through Feature extraction and Feature matching stages in order to classify the speaker along with the expressed emotion. Mel Frequency Cepestral Coefficients (MFCC) have been used for feature extraction. Vector Quantization (VQ) and Gaussian Mixture Model (GMM) techniques are used to explore speaker identification application using MATLAB. In present study 25 sample sentences of 8 native Hindi Speakers (four males and four females) of different age groups were recorded in five basic emotions namely Anger (A), Happiness (H), Neutral (N), Sadness (Sa) and Surprise (S). Chosen sentences were commonly used in everyday communications and recording was done by electrets microphone in partially sound treated room. To judge the emotional content of each speaker for each sentence, a listening test was conducted for the voice samples. 850 sentences were correctly classified for particular emotion category by the listeners was selected for the study. Fig 1: System overview of Emo-voice Model 2. Development of Speaker Identification Systems Speaker identification machine using filter banks and correlating two digital spectrograms named voiceprint analysis was invented in the 1960’s by [1] at Bell Labs and was improved by [2] using linear discriminators. Formant analysis was introduced by [3] at Texas Instruments. The different scholars used various statistical parameters for speaker features extraction independent of phonetic context including instantaneous spectra covariance matrix, averaged auto- correlation, long-term averaged spectra, spectrum and fundamental frequency histograms and linear prediction coefficients. Training Testing Feature Extraction Pattern Matching Classification Feature Vectors Frames Model Training Spk# Emotion# Enrollment of Spk# Feature Vectors Frames Identification Mode Sampled Data Digital ADC Analogue Waveform Identified Emo# of Spk # Models Spk N# Emotion# Anger Sad Surprise Neutral Happy Training Testing Speech Utterance IJCSI International Journal of Computer Science Issues, Volume 13, Issue 4, July 2016 ISSN (Print): 1694-0814 | ISSN (Online): 1694-0784 www.IJCSI.org http://dx.doi.org/10.20943/01201604.4146 41 doi:10.20943/01201604.4146 2016 International Journal of Computer Science Issues