XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE Gender Recognition from Facial Images using Local Gradient Feature Descriptors Olarik Surinta Multi-agent Inelligent Simulation Laboratory (MISL) Faculty of Informatics, Mahasarakham University Maha Sarakham, Thailand olarik.s@msu.ac.th Thananchai Khamket Applied Informatics Group, Department of Information Technology, Faculty of Informatics, Mahasarakham University Maha Sarakahm, Thailand thananchai.k@msu.ac.th Abstract—Local gradient feature descriptors have been proposed to calculate the invariant feature vector. These local gradient methods are very fast to compute the feature vector and achieved very high recognition accuracy when combined with the support vector machine (SVM) classifier. Hence, they have been proposed to solve many problems in image recognition, such as the human face, object, plant, and animal recognition. In this paper, we propose the use of the Haar- cascade classifier for the face detection and the local gradient feature descriptors combined with the SVM classifier to solve the gender recognition problem. We detected 4,624 face images from the ColorFERET dataset. The face images data used in gender recognition included 2,854 male and 1,770 female images, respectively. We divided the dataset into train and test set using 2-fold and 10-fold cross-validation. First, we experimented on 2-fold cross-validation, the results showed that the histogram of oriented gradient (HOG) descriptor outperforms the scale-invariant feature transform (SIFT) descriptor when combined with the support vector machine (SVM) algorithm. The accuracy of the HOG+SVM and the SIFT+SVM were 96.50% and 95.98%. Second, we experimented on 10-fold cross-validation and the SIFT+SVM showed high performance with an accuracy of 99.20%. We discovered that the SIFT+SVM method needed more training data when creating the model. On the other hand, the HOG+SVM method provided better accuracy when the training data was insufficient. Keywords—gender recognition, face detection, local gradient feature descriptor, support vector machine I. INTRODUCTION Gender recognition can be used to improve the efficiency of surveillance and security systems, authentication systems, and face recognition systems [1]. Moreover, it can also be developed into a variety of applications. Research in gender recognition involves with three major tasks; face detection, feature extraction (called face encoding) and recognition system [2]–[6]. Related work. In [7], the deep convolutional neural networks and support vector machines were proposed for gender recognition and tested on the ColorFERET dataset. The pre-processing step consists of detecting and cropping the face image. The face images after the detection stage consisted of 8,364 face images and stored at 256x256 pixel resolution. After that, the data augmentation technique is implemented to generate new face images. A pre-trained model of the AlexNet architecture was used to train the face images. The linear support vector machine is attached to the last fully connected layer. Using this method, the best accuracy was 97.3%. In [3] a local feature descriptor called pyramid histogram of oriented gradients (PHOG) was proposed to represent a local gradient of the image. For the HOG descriptor, the feature vector is calculated according to Equation (1). Additionally, the PHOG descriptor allows dividing an image into a small block at several pyramid levels [8]. The gradient orientations in every level are stored into orientation bins. Then, all of the orientation bins in each pyramid levels are combined. The feature vector is then classified using the SVM classifier with the RBF kernel. The proposed method achieved an accuracy of 88.5% on the labeled faces in the wild (LFW) database. Also, in [5] proposed multiscale facial fusion feature; however, the multiscale method is related to the pyramid technique [3], [8]. The fusion features used in the experimented, including local phase quantization (LPQ) and local binary pattern (LBP) descriptors. The combination of the feature vector is extracted from two descriptor methods and sent to the SVM classifier to classify the face image. The multiscale facial fusion method obtained an accuracy of 86.11% on the images of groups (IoG) dataset. In this paper, we first applied a well-known Haar-cascade classifier, which was invented by Viola and Jones [9], [10] that proposed for object and pedestrian detection, to first find the exact location of a face from the complete image. Note that we focus only on the frontal face, and we ignore the profile face if the head of the people is turned to left or to right. Due to the challenge of the ColorFERET dataset [11], [12], we can extract only 4,624 face images from the 11,119 images. After that, all face images were resized to the same size. The face image resolution used in the experiments was 88x80 pixels. Secondly, two local gradient feature descriptors called the histogram of oriented gradients (HOG) [13] and the scale- invariant feature transform (SIFT) [14] descriptors are proposed to extract the gradient feature from the face image. We experimented with the performance of the local gradient descriptors using several parameters. We set up the parameters of the HOG descriptor; orientations, pixels per cell, and cell per block and the SIFT descriptor; patch size. Finally, the support vector machine (SVM) [15] with the radial basis function (RBF) kernel is proposed to create a model of the gender feature vector from the training data. We implemented the grid-search method to discover the hyper- parameters ( and ) [16] until obtaining the best optimize parameters were obtained. Also, the average accuracies and 346 iSAI-2019