Auditory Wavelet Packet Filters for Multistyle Classification of Speech Under Stress Nurul Aida Amira Bt Johari, M.Hariharan, Sazali Yaacob, Vikneswaran Vijean School of Mechatronic Engineering University Malaysia Perlis Perlis, Malaysia cintan.jerit@gmail.com Abstract – Nowadays, people are having high stress level due to high workload stress, emergency phone call and multitasking . Emotional/stress of a person affect his/her performance in daily life and speech production. The research for understanding the human emotional / stress states using speech has undergone research and development in the pass two decades. This paper presents a feature extraction method based on wavelet packet decomposition for detecting the emotions or stress state of the person. Two different wavelet packet filter bank structure are design based on Mel Scale and Equivalent Rectangular Bandwidth (ERB) Scale. Support Vector Machine (SVM) is employed as a classifier to identify the emotional/stressed states of a person In this study speech samples are taken from Speech Under Simulated and Actual Stress (SUSAS) database. Experimental result shows that the suggestion method can be used to identify the stress and emotional state of a person. Keywords- Emotional/Stressed states, Speech signal, Wavelet packet transform, Support Vector Machine, stress classification. I. INTRODUCTION One important challenging research today was the recognition of Emotion and stress through the speaker speech. It was an important application which helps in identifying the speaker stress emotional conditions. It was one of the human-computer interaction or affective computing. Recognition of the human affective states is rapidly gaining interests among researches and industrial developers since it has a broad range of applications. The study useful in applications such robot recognize emotion, metropolitan emergency telephone system to direct the emotional telephone calls to priority operator and potentially usage in multimedia as interactive voice response system, air craft voice communication monitoring and as phsychiatric diagnosis.[1-3] The user‘s stress and emotional state have been analyzed using speech pattern. Vocal parameter and prosody features such as fundamental frequency, intensity (energy) and speaking rate are strongly related with the emotion expressed in speech[4-12]. Many studies have shown distinctive differences in phonetic features between normal and speech produced under stress [4-12] and classifiers for example Hidden Markov Model [6-9] and, Neural Networks [10-12]. Researchers have proposed different speech features, the most common features are MFCC (Mel-Frequency Cepstral Coefficients), Pitch, LPC (Linear Prediction Coefficients), autocorrelation coefficients and Teager Energy Operator (TEO) based features [5,13]. Up to now, researchers have not identified a specific feature set for the the recognition of emotional/stressed states through speech [12]. Wavelet transform is a promising tool for non- stationary speech analysis. It is capable of in analyzing the speech signal both in time and frequency scale. S. Datta and co-workers [18] developed new filter structure using Mel-like Admissible Wavelet Packet Structure. The filter frequency band are spacing closely to the Mel scale. In this study we designed wavelet packet filterbank structure based on Mel scale and ERB scale. The mel scale, proposed in 1937 by Stevens etc is a perceptual scale of pitchs. In 1938, Fletcher[14,15] determined the critical band concept which is the bandwidth of the human auditory filter along the cochlea path. He assumed that the auditory filters were rectangular and several physiologically motivated formulas have been derived for the ERB scale. The simulation results show that the suggested methods can be used to identify the emotional/stressed states of a person.