Vis Comput DOI 10.1007/s00371-016-1243-y ORIGINAL ARTICLE A spatio-temporal wavelet-based descriptor for dynamic 3D facial expression retrieval and recognition Antonios Danelakis 1 · Theoharis Theoharis 1,2 · Ioannis Pratikakis 3 © Springer-Verlag Berlin Heidelberg 2016 Abstract Human emotions are often expressed by facial expressions and are generated by facial muscle move- ments. In recent years, the analysis of facial expressions has emerged as an active research area due to its various applica- tions such as human–computer interaction, human behavior understanding, biometrics, emotion recognition, computer graphics, driver fatigue detection, and psychology. A novel analysis of dynamic 3D facial expressions using the posi- tional information of automatically detected facial landmarks and the wavelet transformation is presented, which results in the proposed spatio-temporal descriptor. This descriptor is employed within the current paper in a retrieval scheme for dynamic 3D facial expression datasets and is thoroughly evaluated. Experiments have been conducted using the six prototypical expressions of the publicly available BU-4DFE dataset as well as the eight expressions included in the newly released publicly available BP4D-Spontaneous dataset. The obtained retrieval results outperform the retrieval results of the state-of-the-art methodologies. Furthermore, the retrieval results are exploited to achieve unsupervised dynamic 3D facial expression recognition. The aforementioned unsu- pervised procedure achieves better recognition accuracy compared to supervised dynamic 3D facial expression recog- nition state-of-the-art techniques. B Antonios Danelakis a.danelakis@gmail.com 1 Department of Informatics and Telecommunications, University of Athens, Athens, Greece 2 IDI, Norwegian University of Science and Technology (NTNU), Trondheim, Norway 3 Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece Keywords Dynamic 3D mesh sequence · Object retrieval · Facial expressions · Positional information · Wavelet transformation 1 Introduction Human emotions are often expressed by facial expressions instead of verbal communication. Facial expressions are gen- erated by facial muscle movements, resulting in temporary deformation of the face. In recent years, analysis of facial expressions has emerged as an active research area due to its various applications such as human–computer interac- tion, human behavior understanding, biometrics, emotion recognition, computer graphics, driver fatigue detection, and psychology. Ekman [13] was the ﬁrst to systematically study human facial expressions. His study categorizes the prototypical facial expressions, apart from neutral expression, into six classes representing anger, disgust, fear, happiness, sadness and surprise. This categorization is consistent across different ethnicities and cultures. Furthermore, each of the six afore- mentioned expressions is mapped to speciﬁc movements of facial muscles, called action units (AUs). This led to the facial action coding system (FACS), where facial changes are described in terms of AUs [13]. A lot of research has been dedicated to address the prob- lem of facial expression recognition in dynamic sequences of 3D face scans. On the contrary, to the best of our knowl- edge, no sufﬁcient research on facial expression retrieval using dynamic 3D face scans appears in the bibliography. This strengthens the motivation for the current work. This paper introduces a new scheme for dynamic 3D facial expres- sion retrieval. The retrieval scheme consists of three steps: (1) detection of speciﬁc facial landmarks, (2) creation of a 123