(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 7, 2023 570 | Page www.ijacsa.thesai.org Enhancing Facemask Detection using Deep learning Models Abdullahi Ahmed Abdirahman 1 * , Abdirahman Osman Hashi 2 *, Ubaid Mohamed Dahir 3 *, Mohamed Abdirahman Elmi 4 *, Octavio Ernest Romo Rodriguez 5 Faculty Member, SIMAD University, Department of Computing, Mogadishu Somalia 1, 2, 3, 4 Department of Computer Science-Faculty of Informatics, İstanbul Teknik Üniversitesi, İstanbul, Turkey 5 Abstract—Face detection and mask detection are critical tasks in the context of public safety and compliance with mask-wearing protocols. Hence, it is important to track down whoever violated rules and regulations. Therefore, this paper aims to implement four deep learning models for face detection and face with mask detection: MobileNet, ResNet50, Inceptionv3, and VGG19. The models are evaluated based on precision and recall metrics for both face detection and face with mask detection tasks. The results indicate that the proposed model based on ResNet50 achieves superior performance in face detection, demonstrating high precision (99.4%) and recall (98.6%) values. Additionally, the proposed model shows commendable accuracy in mask detection. MobileNet and Inceptionv3 provide satisfactory results, while the proposed model based on VGG19 excels in face detection but shows slightly lower performance in mask detection. The findings contribute to the development of effective face mask detection systems, with implications for public safety. Keywords—Object detection; deep learning; detection; face detection; mask detection; convolutional neural network I. INTRODUCTION Computer vision is a rapidly advancing field that encompasses a wide range of technologies aimed at enabling machines to perceive and interpret visual information, similar to how humans do. One crucial task within computer vision is face detection, which involves locating and identifying human faces in digital images or video streams. Face detection has gained significant attention and importance due to its wide- ranging applications in various domains, including surveillance systems, biometric authentication, facial recognition, human- computer interaction, and social media analysis [1]. Over the years, researchers have made remarkable progress in developing sophisticated face detection algorithms that exhibit high accuracy and robustness. Despite the progress made in general target detection algorithms across various domains, the efficacy of face mask detection techniques remains constrained [2]. In response, researchers have directed their efforts towards this area, employing the "you only look once v2" (YOLOv2) algorithm to devise detection models. Furthermore, advancements have been made by leveraging the YOLOv3 algorithm, which facilitates enhanced feature extraction through an optimized way [3]. However, these challenges arise due to variations in lighting conditions, occlusions, pose variations, complex backgrounds, and scale variations. Lighting variations can lead to significant changes in facial appearance, making it challenging to detect faces consistently. Occlusions, such as glasses, facial hair, or partial face obstructions, further complicate the task by hiding crucial facial features. Additionally, face detection algorithms must handle pose variations, where faces may be rotated, tilted, or viewed from different angles. Complex backgrounds with cluttered scenes pose another challenge, as it becomes difficult to differentiate faces from the surrounding environment[4]. Similarly, scale variations, caused by the varying distances between the camera and the subjects, necessitate robust face detection algorithms that can handle faces of different sizes are required [5]. Over the years, researchers have proposed various face detection techniques, each aiming to address the challenges mentioned above and improve the accuracy and efficiency of face detection algorithms. Early approaches utilized handcrafted features and traditional machine-learning- algorithms, such as Haar cascades and Histogram of Oriented- Gradients (HOG), to detect faces. These methods achieved reasonable results but had limitations in handling pose variations and complex backgrounds. In recent years, the advent of deep learning, particularly convolutional neural networks (CNNs), has revolutionized the field of face detection. CNN-based architectures, such as the Viola-Jones framework, Single Shot MultiBox Detector (SSD), and Faster R-CNN, have demonstrated superior performance in face detection tasks. These models leverage the power of deep learning to automatically learn discriminative features from large-scale datasets, enabling them to handle various challenges faced in face detection. Notably, the use of region- based convolutional neural networks (R-CNN) has greatly improved accuracy by combining region proposals and convolutional networks, allowing for more precise localization of faces [6]. Other researcher improved the YOLOva and made that the YOLO-network generates predictions for bounding boxes in each grid of an image with a size of G×G pixels. However, the network encounters challenges in detecting smaller objects since each bounding box can only be assigned a single class during prediction. The primary issue with YOLO arises from its limitations in accurately localizing objects, particularly when dealing with bounding boxes of unusual ratios [7]. On the other hand, in the realm of face mask detection, various transfer learning approaches have been employed to address the challenges encountered in real-world scenarios. One such method involves utilizing a pre-trained InceptionV3 model as a transfer learning technique to discern individuals wearing or