Multimed Tools Appl
DOI 10.1007/s11042-014-2319-1
Relevance units machine based dimensional
and continuous speech emotion prediction
Fengna Wang · Hichem Sahli · Junbin Gao ·
Dongmei Jiang · Werner Verhelst
Received: 1 March 2014 / Revised: 22 August 2014 / Accepted: 10 October 2014
© Springer Science+Business Media New York 2014
Abstract Emotion plays a significant role in human-computer interaction. The continuing
improvements in speech technology have led to many new and fascinating applications in
human-computer interaction, context aware computing and computer mediated communica-
tion. Such applications require reliable online recognition of the user’s affect. However most
emotion recognition systems are based on speech via an isolated short sentence or word. We
present a framework for online emotion recognition from speech. On the front-end, a voice
activity detection algorithm is used to segment the input speech, and features are estimated
to model long-term properties. Then, dimensional and continuous emotion recognition is
performed via a Relevance Units Machine (RUM). The advantages of the proposed system
are: (i) its computational efficiency in run-time (regression outputs can be produced contin-
uously in pseudo real-time), (ii) RUM offers superior sparsity to the well-known Support
F. Wang () · H. Sahli · W. Verhelst
Department of Electronics and Informatics (ETRO), Vrije Universiteit Brussel (VUB),
VUB-NPU Joint AVSP Lab, Pleinlaan 2, B-1050 Brussels, Belgium
e-mail: fwang@etro.vub.ac.be
H. Sahli
Interuniveristy Microelectronics Center (IMEC), Kapeldreef 75, Leuven, Belgium
e-mail: hsahli@vub.ac.be
J. Gao
School of Computing and Mathematics, Charles Sturt University,
Bathurst, NSW 2795, Australia
e-mail: jbgao@csu.edu.au
D. Jiang
School of Computer Science, Northwestern Polytechnical University (NPU),
VUB-NPU Joint AVSP Lab, Xi’an, China
e-mail: jiangdm@nwpu.edu.cn
W. Verhelst
iMinds, Gaston Crommenlaan 8, 9050 Ghent, Belgium
e-mail: wverhelst@etro.vub.ac.be