Int J Speech Technol (2007) 10: 143–152
DOI 10.1007/s10772-009-9021-0
Statistical feature evaluation for classification of stressed speech
H. Patro · G. Senthil Raja · S. Dandapat
Received: 22 June 2005 / Accepted: 9 February 2009 / Published online: 19 February 2009
© Springer Science+Business Media, LLC 2009
Abstract The variations in speech production due to stress
have an adverse affect on the performances of speech
and speaker recognition algorithms. In this work, differ-
ent speech features, such as Sinusoidal Frequency Features
(SFF), Sinusoidal Amplitude Features (SAF), Cepstral Co-
efficients (CC) and Mel Frequency Cepstral Coefficients
(MFCC), are evaluated to find out their relative effectiveness
to represent the stressed speech. Different statistical feature
evaluation techniques, such as Probability density character-
istics, F-ratio test, Kolmogorov-Smirnov test (KS test) and
Vector Quantization (VQ) classifier are used to assess the
performances of the speech features. Four different stressed
conditions, Neutral, Compassionate, Anger and Happy are
tested. The stressed speech database used in this work con-
sists of 600 stressed speech files which are recorded from 30
speakers. SAF shows maximum recognition result followed
by SFF, MFCC and CC respectively with the VQ classi-
fier. The relative classification results and the relative mag-
nitudes of F-ratio values for SFF, MFCC and CC features
are obtained with the same order. SFF and MFCC feature
show consistent relative performance for all the three tests,
F-ratio, K-S test and VQ classifier.
Keywords Feature evaluation · Probability density ·
Kolmogorov-Smirnov Test
H. Patro · G. Senthil Raja ( ) · S. Dandapat
Department of Electronics and Communication Engineering,
Indian Institute of Technology Guwahati, Guwahati 781039,
Assam, India
e-mail: graja@iitg.ernet.in
H. Patro
e-mail: patro@iitg.ernet.in
S. Dandapat
e-mail: samaren@iitg.ernet.in
1 Introduction
Stress is defined as any condition that causes a speaker to
vary his/her speech production from the normal conditions
(Jensen and Hansen 2001). Stress in speech is induced by
emotion, high workload, sleep deprivation and frustration.
These conditions are known to affect the speech produc-
tion mechanism. As a result, the characteristics of speech
signal change from that of the normal or the neutral condi-
tion (Hansen and Womack 1996). These changes in speech
production lead to difference in the perception of stressed
speech. Due to changes in the speech characteristics, per-
formance of the speaker recognition system and the speech
recognition system may degrade under stressed speech con-
ditions. Speaker recognition is the process of automatically
recognizing who is speaking on the basis of individual infor-
mation extracted from the speech input (Atal 1976; Camp-
bell 1997). Speech recognition can be used to accomplish
computer commands, speech to text conversion for the deaf
and health care centres where some common diseases are
mapped to the appropriate medicine. Analysis of stressed
speech can provide meaningful information for improving
the speech recognition and the speaker recognition.
Conventional speech features, which are widely used
in various speech processing applications, are investigated
to model stressed speech signals under different emotions.
Speech features such as Linear Prediction (LP) and cep-
stral features are tested for analysis of stressed speech sig-
nals (Bou-Ghazale and Hansen 2000; Hansen et al. 1994).
Fast Fourier Transform (FFT) based linear short time Log
Frequency Power Coefficients (LFPC) and Teager Energy
Operator (TEO) based nonlinear LFPC features are used
for recognition of stressed speech (Nwe et al. 2003). Pitch,
energy and spectral contours are ranked by their quantita-
tive contribution to the estimation of an emotion or stress