Optimal deep transfer learning based ethnicity recognition on face images Marwa Obayya a , Saud S. Alotaibi b , Sami Dhahb c,d , Rana Alabdan e , Mesfer Al Duhayyim f , Manar Ahmed Hamza g, ⁎, Mohammed Rizwanullah g , Abdelwahed Motwakel g a Department of Biomedical Engineering, College of Engineering, Princess Nourah bint Abdulrahman University, P.O.Box 84428, Riyadh 11671, Saudi Arabia b Department of Information Systems, College of Computing and Information System, Umm Al-Qura University, Saudi Arabia c Department of Computer Science, College of Science & Art at Mahayil, King Khalid University, Saudi Arabia d University of Tunis EL Manar, Higher Institute of Computer, Research Team on Intelligent Systems in Imaging and Artiﬁcial Vision (SIIVA) – Lab LIMTIC, Aryanah 2036, Tunisia e Department of Information Technology, College of Computer and Information Science, Majmaah University, Al-Majmaah 11952, Saudi Arabia f Department of Computer Science, College of Sciences and Humanities- Aﬂaj, Prince Sattam bin Abdulaziz University, Saudi Arabia g Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam bin Abdulaziz University, AlKharj, Saudi Arabia abstract article info Article history: Received 31 March 2022 Received in revised form 30 September 2022 Accepted 23 October 2022 Available online 29 October 2022 In recent times, deep learning driven face image analysis has gained signiﬁcant interest among several applica- tion areas like surveillance, security, biometrics, etc. The facial analysis intends to compute facial soft biometrics like ethnicity, expression, identiﬁcation, age, gender, and so on. Among several biometrics, ethnicity recognition remains a hot research area. Recent advancements in computer vision (CV) and artiﬁcial intelligence (AI) models form the basis of an effective design of ethnicity recognition models. With this motivation, this paper introduces a novel Harris Hawks optimization with deep transfer learning based fusion model for face ethnicity recognition (HHODTLF-FER) model. The proposed HHODTLF-FER model is to determine the different kinds of ethnicity for applied facial images. A fusion of three pre-trained DL models, namely VGG16, Inception v3, and capsule net- works (CapsNet) models, are employed. In addition, bidirectional long short term memory (BiLSTM) model is ap- plied for ethnicity recognition and Classiﬁcation. Finally, HHO algorithm is utilized to ﬁne tune the hyperparameters contained in the BiLSTM model, showing the novelty of the work. In order to ensure the im- proved recognition performance of the HHODTLF-FER model, a wide ranging experimental analysis is performed using benchmark databases. The comprehensive comparative study highlighted the promising performance of the HHODTLF-FER model over the other approaches. © 2022 Elsevier B.V. All rights reserved. Keywords: Ethnicity recognition Face images Deep learning Fusion model Hyperparameter tuning Face recognition 1. Introduction The face is one of the parts of the human body that consists mostly the semantic information regarding a person the commonly named fa- cial soft biometrics, such as ethnicity, gender, age, identity, and expres- sions, have allured in recent times the interest of the pattern authorization communities thank a note to the greater amount of prob- able application in retailing and video surveillance and to the innate hardship of designing proﬁcient and relevant algorithmic program in the challenge facing real-world outlines [1]. Currently, surveillance sys- tems are grants actually to secure the public. The advancement of artiﬁ- cial intelligence, speciﬁcally AI for computer vision (CV), has made it completely simple for analyzing the end result videos [2]. Various re- searches have currently met out the issue of event detection in video surveillance that needs a capability for identiﬁcation and localization of stated spatiotemporal designs. Another major issue in surveillance video analysis, that attacks much research attention, is the individual re-identiﬁcation trouble [3]. An individual re-identiﬁcation deﬁnes the job of recognizing an individual beyond various photos which has been taken via many cameras or by using single camera [4]. Despite, ethnicity recognition (ER), that is the capacity of systems for determining whether a person matches one of the ethnicity category corresponding to facial appearances observation like skin color, mor- phology, and other deﬁnite pattern, has not acquired the same kind of interest from the scientiﬁc communities [5]. The attentiveness for ER is deﬁnitely mounting, concerning new methodologies and datasets have been currently offered to enhance the accuracy level of real appli- cation recently obtaining a performance biased through ethnicity (face detecting and recognizing, gender categorization, age calculation) or for providing an ultimate push to application in forensic (ethnicity) [6]. The shortage of ethnicity data is primarily because of innate Image and Vision Computing 128 (2022) 104584 ⁎ Corresponding author. E-mail address: ma.hamza@psau.edu.sa (M.A. Hamza). https://doi.org/10.1016/j.imavis.2022.104584 0262-8856/© 2022 Elsevier B.V. All rights reserved. Contents lists available at ScienceDirect Image and Vision Computing journal homepage: www.elsevier.com/locate/imavis