Biomedical Signal Processing and Control 7 (2012) 79–87
Contents lists available at ScienceDirect
Biomedical Signal Processing and Control
journa l h omepage: www.elsevier.com/locate/bspc
Acoustic feature selection and classification of emotions in speech using a 3D
continuous emotion model
Humberto Pérez-Espinosa
∗
, Carlos A. Reyes-García, Luis Villase ˜ nor-Pineda
Instituto Nacional de Astrofísica, Óptica y Electrónica, Luis Erique Erro 1, Tonantzintla, Puebla 72840, Mexico
a r t i c l e i n f o
Article history:
Received 15 September 2010
Received in revised form 23 February 2011
Accepted 24 February 2011
Available online 3 April 2011
Keywords:
Automatic emotion recognition
Continuous emotion model
Feature selection
a b s t r a c t
In this paper we report the results obtained from experiments with a database of emotional speech
in English in order to find the most important acoustic features to estimate Emotion Primitives which
determine the emotional content on speech. We are interested in exploiting the potential benefits of
continuous emotion models, so in this paper we demonstrate the feasibility of applying this approach
to annotation of emotional speech and we explore ways to take advantage of this kind of annotation to
improve the automatic classification of basic emotions.
© 2011 Elsevier Ltd. All rights reserved.
1. Introduction
Emotions are very important in our everyday life, they are
present in everything we do. There is a continuous interaction
between emotions, behavior and thoughts, in such a way that they
constantly influence each other. Emotions are a great source of
information in communication and interaction among people, they
are assimilated intuitively.
The applications of emotion recognition encompass many fields,
for example, as a supporting technology in medical areas such as
psychology, neurology and caring of aged and impaired people.
Automatic emotion recognition based on biomedical signals, facial
and vocal expressions has been applied to diagnosis and following-
up of progressive neurological disorders, specifically Huntington’s
and Parkinson’s diseases [1]. These pathologies are character-
ized by a deficit in emotional processing of fear and disgust and,
thus, the system could be utilized to determine the subject reac-
tion/or absence of reaction to specific emotions, helping the health
professionals to gain a better understanding of these disorders.
Furthermore, the system could be used as a reference to evalu-
ate the patients’ response to certain medicines. Another important
medical application is remote medical support. These kinds of envi-
ronments enable communication between medical professionals
and patients for cases of regular monitoring and emergency situ-
ations. In this scenario the system recognizes patient’s emotional
∗
Corresponding author.
E-mail addresses: humbertop@inaoep.mx (H. Pérez-Espinosa),
kargaxxi@inaoep.mx (C.A. Reyes-García), villasen@inaoep.mx
(L. Villase ˜ nor-Pineda).
states and then transmits data indicating the patient is experienc-
ing depression or sadness, health-care providers monitoring them
will be better prepared to respond. Such a system has the poten-
tial to improve patient satisfaction and health [2–4]. It can also be
an asset for disabled people who have difficulties with communi-
cation. Hearing-impaired people who are not profoundly deaf can
use residual hearing to communicate with other people and learn to
speak with emotions making communication more complete and
understandable. In these cases, emotion recognition engines can
be used as an important element of a computer-assisted emotional
speech training system [5]. For hearing-impaired people, it could
provide an easier way to learn how to speak with emotion more nat-
urally or help speech therapist to guide them to express correctly
emotions in speech. Emotion recognition arouses great interest
in the interface design given that recognizing and understanding
emotions automatically is one of the key steps towards emotional
intelligence in Human–Computer Interaction (HCI). The need for
automatic emotion recognition has emerged due to the tendency
towards a more natural interaction between humans and comput-
ers. Affective computing is a topic within HCI that encompass these
research tendency trying to endow computers with the ability to
detect, recognize, model and take into account user’s emotional
state that plays a role of paramount importance in the way humans
make decisions [6]. Emotions are essential for human thought pro-
cesses that influence interactions between people and intelligent
systems.
In the area of automatic emotion recognition mainly two
annotation schemes have been used to capture and describe the
emotional content in speech: discrete and continuous approaches.
Discrete approach is based on the concept of basic emotions such
as anger, joy, and sadness, that are the most intense form of emo-
1746-8094/$ – see front matter © 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.bspc.2011.02.008