Int J Speech Technol (2007) 10: 143–152 DOI 10.1007/s10772-009-9021-0 Statistical feature evaluation for classification of stressed speech H. Patro · G. Senthil Raja · S. Dandapat Received: 22 June 2005 / Accepted: 9 February 2009 / Published online: 19 February 2009 © Springer Science+Business Media, LLC 2009 Abstract The variations in speech production due to stress have an adverse affect on the performances of speech and speaker recognition algorithms. In this work, differ- ent speech features, such as Sinusoidal Frequency Features (SFF), Sinusoidal Amplitude Features (SAF), Cepstral Co- efficients (CC) and Mel Frequency Cepstral Coefficients (MFCC), are evaluated to find out their relative effectiveness to represent the stressed speech. Different statistical feature evaluation techniques, such as Probability density character- istics, F-ratio test, Kolmogorov-Smirnov test (KS test) and Vector Quantization (VQ) classifier are used to assess the performances of the speech features. Four different stressed conditions, Neutral, Compassionate, Anger and Happy are tested. The stressed speech database used in this work con- sists of 600 stressed speech files which are recorded from 30 speakers. SAF shows maximum recognition result followed by SFF, MFCC and CC respectively with the VQ classi- fier. The relative classification results and the relative mag- nitudes of F-ratio values for SFF, MFCC and CC features are obtained with the same order. SFF and MFCC feature show consistent relative performance for all the three tests, F-ratio, K-S test and VQ classifier. Keywords Feature evaluation · Probability density · Kolmogorov-Smirnov Test H. Patro · G. Senthil Raja () · S. Dandapat Department of Electronics and Communication Engineering, Indian Institute of Technology Guwahati, Guwahati 781039, Assam, India e-mail: graja@iitg.ernet.in H. Patro e-mail: patro@iitg.ernet.in S. Dandapat e-mail: samaren@iitg.ernet.in 1 Introduction Stress is defined as any condition that causes a speaker to vary his/her speech production from the normal conditions (Jensen and Hansen 2001). Stress in speech is induced by emotion, high workload, sleep deprivation and frustration. These conditions are known to affect the speech produc- tion mechanism. As a result, the characteristics of speech signal change from that of the normal or the neutral condi- tion (Hansen and Womack 1996). These changes in speech production lead to difference in the perception of stressed speech. Due to changes in the speech characteristics, per- formance of the speaker recognition system and the speech recognition system may degrade under stressed speech con- ditions. Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual infor- mation extracted from the speech input (Atal 1976; Camp- bell 1997). Speech recognition can be used to accomplish computer commands, speech to text conversion for the deaf and health care centres where some common diseases are mapped to the appropriate medicine. Analysis of stressed speech can provide meaningful information for improving the speech recognition and the speaker recognition. Conventional speech features, which are widely used in various speech processing applications, are investigated to model stressed speech signals under different emotions. Speech features such as Linear Prediction (LP) and cep- stral features are tested for analysis of stressed speech sig- nals (Bou-Ghazale and Hansen 2000; Hansen et al. 1994). Fast Fourier Transform (FFT) based linear short time Log Frequency Power Coefficients (LFPC) and Teager Energy Operator (TEO) based nonlinear LFPC features are used for recognition of stressed speech (Nwe et al. 2003). Pitch, energy and spectral contours are ranked by their quantita- tive contribution to the estimation of an emotion or stress