Optimal deep transfer learning based ethnicity recognition on face images Marwa Obayya a , Saud S. Alotaibi b , Sami Dhahb c,d , Rana Alabdan e , Mesfer Al Duhayyim f , Manar Ahmed Hamza g, , Mohammed Rizwanullah g , Abdelwahed Motwakel g a Department of Biomedical Engineering, College of Engineering, Princess Nourah bint Abdulrahman University, P.O.Box 84428, Riyadh 11671, Saudi Arabia b Department of Information Systems, College of Computing and Information System, Umm Al-Qura University, Saudi Arabia c Department of Computer Science, College of Science & Art at Mahayil, King Khalid University, Saudi Arabia d University of Tunis EL Manar, Higher Institute of Computer, Research Team on Intelligent Systems in Imaging and Articial Vision (SIIVA) Lab LIMTIC, Aryanah 2036, Tunisia e Department of Information Technology, College of Computer and Information Science, Majmaah University, Al-Majmaah 11952, Saudi Arabia f Department of Computer Science, College of Sciences and Humanities- Aaj, Prince Sattam bin Abdulaziz University, Saudi Arabia g Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam bin Abdulaziz University, AlKharj, Saudi Arabia abstract article info Article history: Received 31 March 2022 Received in revised form 30 September 2022 Accepted 23 October 2022 Available online 29 October 2022 In recent times, deep learning driven face image analysis has gained signicant interest among several applica- tion areas like surveillance, security, biometrics, etc. The facial analysis intends to compute facial soft biometrics like ethnicity, expression, identication, age, gender, and so on. Among several biometrics, ethnicity recognition remains a hot research area. Recent advancements in computer vision (CV) and articial intelligence (AI) models form the basis of an effective design of ethnicity recognition models. With this motivation, this paper introduces a novel Harris Hawks optimization with deep transfer learning based fusion model for face ethnicity recognition (HHODTLF-FER) model. The proposed HHODTLF-FER model is to determine the different kinds of ethnicity for applied facial images. A fusion of three pre-trained DL models, namely VGG16, Inception v3, and capsule net- works (CapsNet) models, are employed. In addition, bidirectional long short term memory (BiLSTM) model is ap- plied for ethnicity recognition and Classication. Finally, HHO algorithm is utilized to ne tune the hyperparameters contained in the BiLSTM model, showing the novelty of the work. In order to ensure the im- proved recognition performance of the HHODTLF-FER model, a wide ranging experimental analysis is performed using benchmark databases. The comprehensive comparative study highlighted the promising performance of the HHODTLF-FER model over the other approaches. © 2022 Elsevier B.V. All rights reserved. Keywords: Ethnicity recognition Face images Deep learning Fusion model Hyperparameter tuning Face recognition 1. Introduction The face is one of the parts of the human body that consists mostly the semantic information regarding a person the commonly named fa- cial soft biometrics, such as ethnicity, gender, age, identity, and expres- sions, have allured in recent times the interest of the pattern authorization communities thank a note to the greater amount of prob- able application in retailing and video surveillance and to the innate hardship of designing procient and relevant algorithmic program in the challenge facing real-world outlines [1]. Currently, surveillance sys- tems are grants actually to secure the public. The advancement of arti- cial intelligence, specically AI for computer vision (CV), has made it completely simple for analyzing the end result videos [2]. Various re- searches have currently met out the issue of event detection in video surveillance that needs a capability for identication and localization of stated spatiotemporal designs. Another major issue in surveillance video analysis, that attacks much research attention, is the individual re-identication trouble [3]. An individual re-identication denes the job of recognizing an individual beyond various photos which has been taken via many cameras or by using single camera [4]. Despite, ethnicity recognition (ER), that is the capacity of systems for determining whether a person matches one of the ethnicity category corresponding to facial appearances observation like skin color, mor- phology, and other denite pattern, has not acquired the same kind of interest from the scientic communities [5]. The attentiveness for ER is denitely mounting, concerning new methodologies and datasets have been currently offered to enhance the accuracy level of real appli- cation recently obtaining a performance biased through ethnicity (face detecting and recognizing, gender categorization, age calculation) or for providing an ultimate push to application in forensic (ethnicity) [6]. The shortage of ethnicity data is primarily because of innate Image and Vision Computing 128 (2022) 104584 Corresponding author. E-mail address: ma.hamza@psau.edu.sa (M.A. Hamza). https://doi.org/10.1016/j.imavis.2022.104584 0262-8856/© 2022 Elsevier B.V. All rights reserved. Contents lists available at ScienceDirect Image and Vision Computing journal homepage: www.elsevier.com/locate/imavis