SIViP
DOI 10.1007/s11760-016-1052-9
ORIGINAL PAPER
Facial expression recognition based on image pyramid
and single-branch decision tree
Abubakar M. Ashir
1
· Alaa Eleyan
2
Received: 4 May 2016 / Revised: 4 October 2016 / Accepted: 26 December 2016
© Springer-Verlag London 2017
Abstract In this paper, a new approach has been pro-
posed for improved facial expression recognition. The new
approach is inspired by the compressive sensing theory
and multiresolution approach to facial expression problems.
Initially, each image sample is decomposed into desired pyra-
mid levels at different sizes and resolutions. Pyramid features
at all levels are concatenated to form a pyramid feature vector.
The vectors are further reinforced and reduced in dimension
using a measurement matrix based on compressive sensing
theory. For classification, a multilevel classification approach
based on single-branch decision tree has been proposed. The
proposed multilevel classification approach trains a number
of binary support vector machines equal to the number of
classes in the datasets. Class of test data is evaluated through
the nodes of the tree from the root to its apex. The results
obtained from the approach are impressive and outperform
most of its counterparts in the literature under the same
databases and settings.
Keywords Facial expression recognition · Compressive
sensing · Image pyramid
1 Introduction
Facial expression recognition (FER) is one of the branches
of pattern recognition (PR) which enjoys increasing patron-
B Abubakar M. Ashir
ashir4real@yahoo.com
Alaa Eleyan
aeleyan@avrasya.edu.tr
1
Department of Electric and Electronic Engineering,
Selçuk University, Konya, Turkey
2
Department of Electric and Electronic Engineering,
Avrasya University, Trabzon, Turkey
age from many works of life in recent times. This could be
attributed to the developments in technology and human’s
needs for information and intelligence gathering. Some of the
emerging applications of FER are in marketing, security, psy-
chology, medical diagnosis, human–machine interaction and
entertainments [1]. The algorithm flow for FER is not much
different from its counterpart algorithms in PR. The steps
include: preprocessing, feature extraction, classification and
the decision. Generally, in FER two major approaches are
adopted for feature extraction. First, is the component-based
(holistic) and the second is feature-based (local) approach.
In the former, the entire face image is used as input to extract
features, while in the later only some key points within the
face image (e.g., eye, nose, mouth) are used to take some
geometrical measurements and localized information around
them [2, 3].
Use of multiresolution algorithms such as Gabor wavelets
transform (GWT), discrete Wavelets transform (DWT) to
mention but few, is very common in FER and appears to
have an edge over other feature extractors like local binary
pattern (LBP), principal component analysis (PCA) and local
discriminant analysis (LDA) [2]. Authors in [2] used a mul-
tiresolution transform called curvelets transform (CT) at
different orientations and scales to form curvelets products
which were wrapped around their origin. The products were
then used to extract curvelets coefficients using inverse CT.
The coefficients are subsequently used as feature vectors.
Though improved performance has been reported, intensive
computations are required to arrive at that performance. In a
similar way, in [3–8] authors used GWT in one form or the
other to encode features for FER. For instance, [3] subjected
the face images to local, multiscale Gabor filter operations,
and then the resulting Gabor decompositions were encoded
using radial grids, imitating the topographical map-structure
of the human visual cortex (HVC). Due to the similarity of
123