Arsirii O. О., Petrosiuk D. V. / Herald of Advanced Information Technology
2023;Vol.6 No.3: 203–214
ISSN 2663-0176 (Print)
ISSN 2663-7731 (Online)
Methodological principles of
information technology
203
DOI: https://doi.org/10.15276/hait.06.2023.13
UDC 004.8
Pseudo-labeling of transfer learning convolutional neural
network data for human facial emotion recognition
Olena О. Arsirii
1)
ORCID: https://orcid.org/0000-0001-8130-9613; e.arsiriy@gmail.com. Scopus Author ID: 54419480900
Denys V. Petrosiuk
1)
ORCID: https://orcid.org/0000-0003-4644-3678; d.petrosyuk1994@gmail.com. Scopus Author ID: 54419479400
1)
Odessa Polytechnic National University, 1, Shevchenko Ave. Odessa, 65044, Ukraine
ABSTRACT
The relevance of solving the problem of facial emotion recognition on human images in the creation of modern intelligent
systems of computer vision and human-machine interaction, online learning and emotional marketing, health care and forensics,
machine graphics and game intelligence is shown. Successful examples of technological solutions to the problem of facial emotion
recognition using transfer learning of deep convolutional neural networks are shown. But the use of such popular datasets as DISFA,
CelebA, AffectNet, for deep learning of convolutional neural networks does not give good results in terms of the accuracy of emotion
recognition, because almost all training sets have fundamental flaws related to errors in their creation, such as the lack of data of a
certain class, imbalance of classes, subjectivity and ambiguity of labeling, insufficient amount of data for deep learning, etc. It is
proposed to overcome the noted shortcomings of popular datasets for emotion recognition by adding to the training sample additional
pseudo-labeled images with human emotions, on which recognition occurs with high accuracy. The aim of the research is to increase
the accuracy of facial emotion recognition on the image of a human by developing a pseudo-labeling method for transfer learning of
a deep neural network. To achieve the aim, the following tasks were solved: a convolutional neural network model, previously
trained on the ImageNet set using the transfer learning method, was adjusted on the RAF-DB data set to solve emotion recognition
tasks; a pseudo-labeling method of the RAF−DB set data was developed for semi -supervised learning of a convolutional neural
network model for the task of facial emotion recognition; the accuracy of facial emotion recognition was analyzed based on the
developed convolutional neural network model and the method of pseudo-labeling of RAF-DB set data for its correction. It is shown
that the use of the developed method of pseudo-labeling data and transfer learning of the MobileNet V1 convolutional neural network
model allowed to increase the accuracy of facial emotion recognition on the images of the RAF-DB dataset by 2 percent (from 76 to
78 %) according to the F1 estimate. At the same time, taking into account the significant imbalance of the classes, for the 7 main
emotions in the training set, we have a significant increase in the accuracy of recognizing a few representatives of such emotions as
surprise (from 71 to 77 %), fearful (from 64 to 69%), sad (from 72 to 76 %), angry with (from 64 to 74 %), neutral (from 66 to 71
%). The accuracy of recognizing the emotion of happy, which is the most common, decreased (from 91 to 86 %) Thus, it can be
concluded that the use of the developed pseudo-labeling method gives good results in overcoming such shortcomings of datasets for
deep learning of convolutional neural networks such as lack of data of a certain type, imbalance of classes, insufficient amount of
data for deep learning, etc.
Keywords: pseudo-labeling data; semi-supervised learning; transfer learning; convolution neural networks; facial emotion
recognition
For citation: Arsirii O. O., Petrosiuk D. V. “Pseudo-labeling of transfer learning convolutional neural network data for human facial emotion
recognition“. Herald of Advanced Information Technology. 2023; Vol. 6 No.3: 203214. DOI: https://doi.org/10.15276/hait.06.2023.13
INTRODUCTION
Modern intelligent systems of computer vision
and human-machine interaction, online learning and
emotional marketing, health care and forensics,
machine graphics and game intelligence have a basic
foundation in the form of a model of social
interaction.
The development of such a model requires
information about the emotional the condition of a
person, the acquisition of which is connected with
the solution of the problem of Facial Emotion
© Arsirii O., Petrosiuk D., 2023
Recognition (FER) on images of human faces [1, 2].
The works show successful examples of
technological solutions to the FER problem using
transfer learning of deep Convolutional Neural
Networks (CNN) [3, 4], [5, 6]. But the use of such
popular datasets as DISFA, CelebA, AffectNet [7]
for deep learning of CNN does not give good results
in terms of FER accuracy because these training
samples have fundamental disadvantages such as
lack of data of a certain class, class imbalance,
subjectivity and ambiguity of labeling, insufficient
volume of data for deep learning, etc. Therefore, the
topic related to increasing the accuracy of FER due
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/deed.uk)