Spontaneous Facial Expression Recognition: A Part
Based Approach
Nazil Perveen, Dinesh Singh and C. Krishna Mohan
Visual Intelligence and Learning Group (VIGIL),
Department of Computer Science and Engineering,
Indian Institute of Technology Hyderabad, Kandi, Sangareddy-502285, India.
email: {cs14resch11006, cs14resch11003, ckm}@iith.ac.in
Abstract—A part-based approach for spontaneous expression
recognition using audio-visual feature and deep convolution
neural network (DCNN) is proposed. The ability of convolution
neural network to handle variations in translation and scale is
exploited for extracting visual features. The sub-regions, namely,
eye and mouth parts extracted from the video faces are given as
an input to the deep CNN (DCNN) inorder to extract convnet
features. The audio features, namely, voice-report, voice intensity,
and other prosodic features are used to obtain complementary
information useful for classification. The confidence scores of the
classifier trained on different facial parts and audio information
are combined using different fusion rules for recognizing expres-
sions. The effectiveness of the proposed approach is demonstrated
on acted facial expression in wild (AFEW) dataset.
Keywords—Isotropic smoothing, Expression recognition and
Convolution Neural Network.
I. I NTRODUCTION
Emotion reflects the mental status of the human mind.
Mehrabian [1] indicated that the verbal part (i.e. spoken words)
of a message contributes only 7% of the effect of any message;
the vocal part (i.e. voice information) contributes for 38%,
while facial expression contributes for 55% of the effect of
any message. Therefore, facial expression plays an important
role in recognition of human emotions, like angry, disgust,
fear, happy, neutral, sadness, and surprise. The expressions
when recognized in an unconstrained environment is termed
as spontaneous expression recognition, which becomes very
difficult task due to various real world issues like, illumination,
posed faces, scaling, occlusion, etc. Handling these issues
while maintaining reasonable classification accuracy is one
of the biggest challenge today. Being an active research area,
spontaneous expression recognition has immense applications.
It can be used to make smart devices smarter using emotional
intelligence [2], perform surveys on products and services,
engagement systems, mood recognition, psychology, real time
gaming, animated movies, etc. [3], [4], [5], [6], [7], [8], [9],
[10], [11], [12], [13]. Spontaneous expression recognition uses
data science technologies like machine learning, artificial intel-
ligence, big data, bio-sensors etc. to recognize the expressions.
Expression analyst and data scientists are trying to synchronize
stimuli to expressions for detecting micro-expressions, etc., to
enhance the recognition rate of primary emotions [14].
In 1978, Paul and Ekman define the human facial expres-
sions which can be classified into seven basic classes, namely,
angry, disgust, fear, happy, neutral, sad, and surprise, are also
known as universal expressions [15]. Several exhaustive re-
search works were being carried out in literature for automatic
recognition of expression in static images with high recog-
nition rate. Recent advances in expression recognition from
2013 to 2015 have changed the perception of the recognition
system. In 2014, vision and attention theory based sampling for
continuous facial expression recognition by Bir Banu et al. [16]
propose the way in which human visualize the expressions. In
their approach, the dataset is divided into two categories based
on the frame rates, namely, low and high frame rate. In former
one, person is idle and expressing no emotions and in latter
one, person is changing their expressions frequently. The basic
contribution of Bir Banu is to make a video based temporal
sampling where they describe appearance based methodology
for feature extraction and then classify the features using
support vector machine classifier. The recognition rate is 75%
on the standard dataset AVEC 2011 or 2012, CK & CK+,
MMI.
An automatic frame work for textured 3-D video based fa-
cial expression recognition by Munauwar and Bennamoun [17]
hypothesize texture based dynamic approach for recognizing
expressions. Initially, small patches are extracted from the
sample videos and these patches are then represented in points
such that each point is lying on Grassmanian manifold, and
using Grassmanian kernalization clusters are formed using
graph based spectral clustering mechanism. All cluster centers
are embedded with each other to reproduce the kernel Hilbert
space such that support vector machines (SVM) for each
expressions are learned. The recognition accuracy is 93%-94%
on BU4DFE (Binghamton University 3-D facial database).
A different approach of 4-D facial expression recognition
by learning geometric deformation by Benamor et.al. [18]
represented face as combinations of radial curves which lie on
Riemannian manifold is proposed in 2014 that measures the
deformation induced by each facial expression. The features
obtained are of very high dimension and hence linear discrim-
inant analysis (LDA) transformation is applied for projecting
it in low dimension. Two approaches are implemented for
classification, one is temporal or dynamic HMM and other
is mean deformation patches applied to random forest classifi-
cation. The recognition rate is 93% on an average in different
datasets, namely, BU4-DFE, Boshphorus, D3-DFACS and HI4-
D-ADSIP datasets.
Earlier, the topic of spontaneous expression recognition i.e.
expression recognition in an unconstrained environment, is not
focused in the literature. J. F. Cohn et al., introduce sponta-
2016 15th IEEE International Conference on Machine Learning and Applications
978-1-5090-6167-9/16 $31.00 © 2016 IEEE
DOI 10.1109/ICMLA.2016.162
819