Vis Comput
DOI 10.1007/s00371-016-1243-y
ORIGINAL ARTICLE
A spatio-temporal wavelet-based descriptor for dynamic 3D facial
expression retrieval and recognition
Antonios Danelakis
1
· Theoharis Theoharis
1,2
· Ioannis Pratikakis
3
© Springer-Verlag Berlin Heidelberg 2016
Abstract Human emotions are often expressed by facial
expressions and are generated by facial muscle move-
ments. In recent years, the analysis of facial expressions has
emerged as an active research area due to its various applica-
tions such as human–computer interaction, human behavior
understanding, biometrics, emotion recognition, computer
graphics, driver fatigue detection, and psychology. A novel
analysis of dynamic 3D facial expressions using the posi-
tional information of automatically detected facial landmarks
and the wavelet transformation is presented, which results
in the proposed spatio-temporal descriptor. This descriptor
is employed within the current paper in a retrieval scheme
for dynamic 3D facial expression datasets and is thoroughly
evaluated. Experiments have been conducted using the six
prototypical expressions of the publicly available BU-4DFE
dataset as well as the eight expressions included in the newly
released publicly available BP4D-Spontaneous dataset. The
obtained retrieval results outperform the retrieval results of
the state-of-the-art methodologies. Furthermore, the retrieval
results are exploited to achieve unsupervised dynamic 3D
facial expression recognition. The aforementioned unsu-
pervised procedure achieves better recognition accuracy
compared to supervised dynamic 3D facial expression recog-
nition state-of-the-art techniques.
B Antonios Danelakis
a.danelakis@gmail.com
1
Department of Informatics and Telecommunications,
University of Athens, Athens, Greece
2
IDI, Norwegian University of Science and Technology
(NTNU), Trondheim, Norway
3
Department of Electrical and Computer Engineering,
Democritus University of Thrace, Xanthi, Greece
Keywords Dynamic 3D mesh sequence · Object retrieval ·
Facial expressions · Positional information · Wavelet
transformation
1 Introduction
Human emotions are often expressed by facial expressions
instead of verbal communication. Facial expressions are gen-
erated by facial muscle movements, resulting in temporary
deformation of the face. In recent years, analysis of facial
expressions has emerged as an active research area due to
its various applications such as human–computer interac-
tion, human behavior understanding, biometrics, emotion
recognition, computer graphics, driver fatigue detection, and
psychology.
Ekman [13] was the first to systematically study human
facial expressions. His study categorizes the prototypical
facial expressions, apart from neutral expression, into six
classes representing anger, disgust, fear, happiness, sadness
and surprise. This categorization is consistent across different
ethnicities and cultures. Furthermore, each of the six afore-
mentioned expressions is mapped to specific movements of
facial muscles, called action units (AUs). This led to the
facial action coding system (FACS), where facial changes
are described in terms of AUs [13].
A lot of research has been dedicated to address the prob-
lem of facial expression recognition in dynamic sequences
of 3D face scans. On the contrary, to the best of our knowl-
edge, no sufficient research on facial expression retrieval
using dynamic 3D face scans appears in the bibliography.
This strengthens the motivation for the current work. This
paper introduces a new scheme for dynamic 3D facial expres-
sion retrieval. The retrieval scheme consists of three steps:
(1) detection of specific facial landmarks, (2) creation of a
123