XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE
Gender Recognition from Facial Images using Local
Gradient Feature Descriptors
Olarik Surinta
Multi-agent Inelligent Simulation Laboratory (MISL)
Faculty of Informatics, Mahasarakham University
Maha Sarakham, Thailand
olarik.s@msu.ac.th
Thananchai Khamket
Applied Informatics Group, Department of Information
Technology, Faculty of Informatics, Mahasarakham University
Maha Sarakahm, Thailand
thananchai.k@msu.ac.th
Abstract—Local gradient feature descriptors have been
proposed to calculate the invariant feature vector. These local
gradient methods are very fast to compute the feature vector
and achieved very high recognition accuracy when combined
with the support vector machine (SVM) classifier. Hence, they
have been proposed to solve many problems in image
recognition, such as the human face, object, plant, and animal
recognition. In this paper, we propose the use of the Haar-
cascade classifier for the face detection and the local gradient
feature descriptors combined with the SVM classifier to solve
the gender recognition problem. We detected 4,624 face images
from the ColorFERET dataset. The face images data used in
gender recognition included 2,854 male and 1,770 female
images, respectively. We divided the dataset into train and test
set using 2-fold and 10-fold cross-validation. First, we
experimented on 2-fold cross-validation, the results showed that
the histogram of oriented gradient (HOG) descriptor
outperforms the scale-invariant feature transform (SIFT)
descriptor when combined with the support vector machine
(SVM) algorithm. The accuracy of the HOG+SVM and the
SIFT+SVM were 96.50% and 95.98%. Second, we
experimented on 10-fold cross-validation and the SIFT+SVM
showed high performance with an accuracy of 99.20%. We
discovered that the SIFT+SVM method needed more training
data when creating the model. On the other hand, the
HOG+SVM method provided better accuracy when the training
data was insufficient.
Keywords—gender recognition, face detection, local
gradient feature descriptor, support vector machine
I. INTRODUCTION
Gender recognition can be used to improve the efficiency
of surveillance and security systems, authentication systems,
and face recognition systems [1]. Moreover, it can also be
developed into a variety of applications. Research in gender
recognition involves with three major tasks; face detection,
feature extraction (called face encoding) and recognition
system [2]–[6].
Related work. In [7], the deep convolutional neural
networks and support vector machines were proposed for
gender recognition and tested on the ColorFERET dataset.
The pre-processing step consists of detecting and cropping the
face image. The face images after the detection stage consisted
of 8,364 face images and stored at 256x256 pixel resolution.
After that, the data augmentation technique is implemented to
generate new face images. A pre-trained model of the AlexNet
architecture was used to train the face images. The linear
support vector machine is attached to the last fully connected
layer. Using this method, the best accuracy was 97.3%.
In [3] a local feature descriptor called pyramid histogram
of oriented gradients (PHOG) was proposed to represent a
local gradient of the image. For the HOG descriptor, the
feature vector is calculated according to Equation (1).
Additionally, the PHOG descriptor allows dividing an image
into a small block at several pyramid levels [8]. The gradient
orientations in every level are stored into orientation bins.
Then, all of the orientation bins in each pyramid levels are
combined. The feature vector is then classified using the SVM
classifier with the RBF kernel. The proposed method achieved
an accuracy of 88.5% on the labeled faces in the wild (LFW)
database.
Also, in [5] proposed multiscale facial fusion feature;
however, the multiscale method is related to the pyramid
technique [3], [8]. The fusion features used in the
experimented, including local phase quantization (LPQ) and
local binary pattern (LBP) descriptors. The combination of the
feature vector is extracted from two descriptor methods and
sent to the SVM classifier to classify the face image. The
multiscale facial fusion method obtained an accuracy of
86.11% on the images of groups (IoG) dataset.
In this paper, we first applied a well-known Haar-cascade
classifier, which was invented by Viola and Jones [9], [10]
that proposed for object and pedestrian detection, to first find
the exact location of a face from the complete image. Note that
we focus only on the frontal face, and we ignore the profile
face if the head of the people is turned to left or to right. Due
to the challenge of the ColorFERET dataset [11], [12], we can
extract only 4,624 face images from the 11,119 images. After
that, all face images were resized to the same size. The face
image resolution used in the experiments was 88x80 pixels.
Secondly, two local gradient feature descriptors called the
histogram of oriented gradients (HOG) [13] and the scale-
invariant feature transform (SIFT) [14] descriptors are
proposed to extract the gradient feature from the face image.
We experimented with the performance of the local gradient
descriptors using several parameters. We set up the parameters
of the HOG descriptor; orientations, pixels per cell, and cell
per block and the SIFT descriptor; patch size.
Finally, the support vector machine (SVM) [15] with the
radial basis function (RBF) kernel is proposed to create a
model of the gender feature vector from the training data. We
implemented the grid-search method to discover the hyper-
parameters ( and ) [16] until obtaining the best optimize
parameters were obtained. Also, the average accuracies and
346 iSAI-2019