EXPERIMENTS WITH FACIAL EXPRESSION RECOGNITION USING SPATIOTEMPORAL
LOCAL BINARY PATTERNS
Guoying Zhao and Matti Pietikäinen
Machine Vision Group, Infotech Oulu and Department of Electrical and Information Engineering,
P. O. Box 4500 FI-90014 University of Oulu, Finland
E-mail: {gyzhao, mkp}@ee.oulu.fi
ABSTRACT
In this paper, the recently introduced method for facial
expression recognition using spatiotemporal local binary
patterns is reviewed and experiments are carried out to
investigate the robustness of the approach. In experiments
with the Cohn-Kanade facial expression database, our
results from the cross-validation with low resolutions and
low frame rates are promising. Advantages of our approach
include local processing, robustness to low quality of videos
and simple computation.
1. INTRODUCTION
A goal of facial expression recognition is to determine the
emotional state of the face, e.g. happiness, sadness, surprise,
neutral, anger, fear, and disgust, regardless of the identity of
the face.
The face can express emotions sooner than people
verbalize or even realize their feelings [1], and research in
social psychology has shown that facial expressions form
the major modality in human communication [2], So facial
expression is one of the most powerful, natural and
immediate means for human beings to communicate their
emotions and intentions [3]. Even though much work has
been done, recognizing facial expression with a high
accuracy remains to be difficult due to the complexity and
variety of facial expressions.
Pantic and Rothkrantz [4] surveyed the work done in
automating facial expression analysis in facial images and
image sequences. In another survey by Fasel and Luettin
[5], the most prominent automatic facial expression analysis
methods and systems were introduced. They also discussed
some facial motion and deformation extraction approaches
as well as classification methods.
According to psychologists [6], analysis of sequences
of images produces more accurate and robust recognition of
facial expressions than using only single frames.
Psychological studies have suggested that the facial motion
is fundamental to the recognition of facial expressions.
Experiments conducted by Bassili [6] demonstrate that the
humans do better job in recognizing expressions from
dynamic images as opposed to mug shot.
For using dynamic information to analyze facial
expressions, several systems attempt to recognize fine-
grained changes in facial expression based on the Facial
Action Coding System (FACS) which was developed by
Ekman and Friesen [7] for describing facial expressions by
action units (AUs), for instance [1, 8-10]. Some other papers
attempt to recognize a small set of prototypic emotional
expressions, i.e. joy, surprise, anger, sadness, fear, and
disgust. Our work focuses on the latter one. Yeasin et al.
[11] applied the horizontal and vertical components of the
optic flow as features. At the frame level, a k-NN rule was
used to derive characteristic temporal signature for every
video sequence. At the sequence level, discrete HMMs were
trained to recognize the temporal signatures associated with
each basic expression. This method cannot deal with the
illumination variation, however. Manglik et al. [12]
presented a method for extracting position of the eyes,
eyebrows and mouth, then determining the cheek and
forehead regions. The optical flow procedure was applied to
these regions and the resulting vertical optical flow values
were fed to the discrete Hopfield network. Their dataset
only included 20 samples, obtaining a result of 79.8%.
Aleksic and Katsaggelos [13] exploited Facial Animation
Parameters as features describing facial expressions, and
utilized multi-stream Hidden Markov Models for
recognition. The system is complex, thus difficult to
perform in real-time. Cohen et al. [14] introduced a Tree-
Augmented-Naive Bayes classifier for recognition, but they
only experimented on a set of five people, and accuracy is
only around 65% for person-independent evaluation.
Recently, the block-based approach based on local
binary patterns (LBP), originally developed for single face
images [16], was extended for the recognition of specific
dynamic events such as facial expressions [17]. In the
present paper, we review this approach and carry out
additional experiments with low resolution images. We also
do cross validation between different resolutions and frame
rates to evaluate the performance of the proposed approach.
2. SPATIOTEMPORAL LOCAL BINARY PATTERNS
Local texture descriptors have gained increasing attention in
facial image analysis due to their robustness to challenges
1091 1-4244-1017-7/07/$25.00 ©2007 IEEE ICME 2007