(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 4, 2017 SVM based Emotional Speaker Recognition using MFCC-SDC Features Asma Mansour University of Tunis El Manar National School of Engineers of Tunis Signal, Image and Information Technology laboratory BP. 37 Le Belvdre, 1002, Tunis, Tunisia Zied Lachiri University of Tunis El Manar National School of Engineers of Tunis BP. 37 Le Belvdre, 1002, Tunis, Tunisia Abstract—Enhancing the performance of emotional speaker recognition process has witnessed an increasing interest in the last years. This paper highlights a methodology for speaker recognition under different emotional states based on the mul- ticlass Support Vector Machine (SVM) classifier. We compare two feature extraction methods which are used to represent emotional speech utterances in order to obtain best accuracies. The first method known as traditional Mel-Frequency Cepstral Coefficients (MFCC) and the second one is MFCC combined with Shifted-Delta-Cepstra (MFCC-SDC). Experimentations are conducted on IEMOCAP database using two multiclass SVM ap- proaches: One-Against-One (OAO) and One Against-All (OAA). Obtained results show that MFCC-SDC features outperform the conventional MFCC. KeywordsEmotion; Speaker recognition; Mel Frequency Cep- stral Coefficients (MFCC); Shifted-Delta-Cepstral (SDC); SVM I. I NTRODUCTION Emotional speaker recognition is one of research fields in Human-Computer Interaction (HCI) or affective computing [1]. The main motivation comes from the want to develop a human machine interface that’s more intelligent, adaptive and credible. This may gives computers the ability to know person in such context for many real applications .Speaker recognition in emotional context can be used in criminal or forensic investigation to identify the suspected person who produces the emotional utterances. It can also be used in telecommunication to ameliorate the telephone based speech recognition performance,etc... Emotional speaker recognition systems are composed of two mains components which are feature extraction and clas- sification [2]. In littrature, different classifiers have been used to model speakers under emotional states. I.Shahin [3] has used Hidden Markov Model(HMM) and suprasegmental hidden Markov models (SPHMMs) to identify speaker using emo- tional cues. In the same context, Yingchun Yang et al. citeyang have used GMM-UBM classifier. Support Vector Machines (SVM) are used [5] to show the important influence of the emotional state upon text independent speaker identification. In general, human emotions are complicated phenomenon. Thus, choosing a most suitable features that represent emo- tional utterances has been an important step in emotional speaker recognition process. Researches have demonstrated that features derived from the speech spectrum usually give best performances for the automatic recognition system. In- deed, the spectrum reflects the geometry of the system that generates the speech signal. Therefore, spectral features are widely developed for the speaker recognition such as Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Cepstral Coefficients(LPCC) in addition to the other acoustic features [6]. Many features have been used to ameliorate the perfor- mance of speaker recognition system in emotional context [7]. MFCC features are the most common used features in speaker recognition in emotional context [8] [9]. Linear Predictive Cepstral Coefficients(LPCC) have also been used frequently in this context [10] . MFCC coefficients are based on human auditory system [11]. However, these coefficients will be more efficient, if speech is of short duration. For long-term speech signals, Shifted Delta Coefficients (SDC) features are more appropri- ated, since they identify the dynamic behavior of the speaker along the prosodic features of speech signal. Kshirod Sarmah et al. [12] have employed MFCC-SDC features to identify language. N. Murali Krishna et al. [13] have used MFCC-SDC to recognize different human emotional states. Fred Richardso et al. [14] have introduced SDC features for speaker and language recognition. However, this method has not been used in emotional speaker recognition applications. In this work, we propose to investigate MFCC-SDC fea- tures to improve the performances of the speaker recognition system in emotional talking environment. Hence in order to evaluate the proposed recognition system, it is advantageous to use two multiclass SVM approaches : One Against One (OAO) and One Against All (OAA) in classification step. We are also interested to compare obtained results from diffrent feature extraction methods. This paper is organized as follows: Section II present the proposed emotional speaker recognition system . Section III presents the process of MFCC and MFCC-SDC features extraction and Section IV deals with multiclass Support Vector Machines approaches. Results and experiments are given in Section V. Finally, conclusion is given in Section VI. II. SYSTEM DESIGN The proposed emotional speaker recognition system is displayed in figure 1. It can be divided into two main com- ponents: feature extraction and speaker classification. Firstly, www.ijacsa.thesai.org 538 | P a g e