Training dataset construction for anomaly detection in face anti-spoofing L. Abduh and I. Ivrissimtzis Durham University, Department of Computer Science, UK Abstract Anomaly detection, which is approaching the problem of face anti-spoofing as a one-class classification problem, is emerging as an increasingly popular alternative to the traditional approach of training binary classifiers on specialized anti-spoofing databases which contain both client and imposter samples. In this paper, we discuss the training protocols in the existing work on anomaly detection for face anti-spoofing, and note that they use images exclusively from specialized anti-spoofing databases, even though only common images of real faces are needed. In a proof-of-concept experiment, we demonstrate the potential benefits of adding in the anomaly detection training sets images from general face recognition, rather than specialised face anti-spoofing, databases, or images from the in-the-wild images. We train a convolutional autoencoder on real faces and compare the reconstruction error against a threshold to classify a face image as either client or imposter. Our results show that the inclusion in the training set of in-the-wild images increases the discriminating power of the classifier on an unseen database, as evidenced by an increase in the value of the Area Under the Curve. CCS Concepts Computing methodologies Computer vision tasks; Image manipulation; 1. Introduction Face liveness tests authenticate users of face recognition systems by processing input images and deciding whether they come from a human face or, for example, from printed photos held in front of the system’s camera by an imposter. The main challenge for de- veloping a robust face anti-spoofing system is the large number of different types of presentation attacks the system must learn to rec- ognize. For example, an imposter could be presenting to the face recognition system a printed photo, a screen displaying a still im- age, or a screen replaying a video. A multitude of other factors, such as the quality of the printed photo, the resolution and type of the displaying screen, the illumination conditions of the scene, and the characteristics of the system’s camera, may also have a sig- nificant effect on the performance of any anti-spoofing algorithm. Moreover, a robust anti-spoofing algorithm should be able to cope with previously unseen attack methods, which were not anticipated prior to its deployment. Traditionally, face anti-spoofing is approached as a binary classi- fication problem and classifiers are trained on specialised datasets, containing both client and imposter images and videos. The main limitation of this approach is associated with the high cost of cre- ating such databases. That is, a limited only number of attacks is simulated, on a limited number of subjects, while the variability of important environmental factors such as illumination conditions and background is also limited. As a result, the classifiers do not always generalize well to previously unseen attacks. In this context, anomaly detection, using classifiers trained on a one class dataset of client images only, is becoming an increas- ingly popular approach to face anti-spoofing [AKC17][AK18]. The present work is motivated by the observation that training with client images only can also use in-the-wild face images, that is, a set of face images harvested online, as well as face images from databases that do not specialize in face-anti-spoofing. After giving a brief overview of the general literature on face anti-spoofing, in Section 2.2 we review the relevant literature on the use of anomaly detection for face anti-spoofing and establish our main observation. That is, in the existing literature, the training data are drawn from specialised face anti-spoofing databases, even though they are just common face images. In Sections 3 and 4, we describe a proof-of-concept experiment on the feasibility of an alternative approach to the creation of one- class training sets. In particular, we augment an initial training set of client images from specialised face anti-spoofing databases, first with images from non-specialised databases, the SCFace [GDG11] and the CASIA-Web Face [YLLL14] in particular, and then with images from the in-the-wild, which were semi-automatically har- vested from online sources. © 2021 The Author(s) Eurographics Proceedings © 2021 The Eurographics Association. DOI: 10.2312/cgvc.20211312 https://diglib.eg.org https://www.eg.org EG UK Computer Graphics & Visual Computing (2021) K. Xu and M. Turner (Editors)