Training dataset construction for anomaly detection in face anti-spooﬁng L. Abduh and I. Ivrissimtzis Durham University, Department of Computer Science, UK Abstract Anomaly detection, which is approaching the problem of face anti-spooﬁng as a one-class classiﬁcation problem, is emerging as an increasingly popular alternative to the traditional approach of training binary classiﬁers on specialized anti-spooﬁng databases which contain both client and imposter samples. In this paper, we discuss the training protocols in the existing work on anomaly detection for face anti-spooﬁng, and note that they use images exclusively from specialized anti-spooﬁng databases, even though only common images of real faces are needed. In a proof-of-concept experiment, we demonstrate the potential beneﬁts of adding in the anomaly detection training sets images from general face recognition, rather than specialised face anti-spooﬁng, databases, or images from the in-the-wild images. We train a convolutional autoencoder on real faces and compare the reconstruction error against a threshold to classify a face image as either client or imposter. Our results show that the inclusion in the training set of in-the-wild images increases the discriminating power of the classiﬁer on an unseen database, as evidenced by an increase in the value of the Area Under the Curve. CCS Concepts • Computing methodologies → Computer vision tasks; Image manipulation; 1. Introduction Face liveness tests authenticate users of face recognition systems by processing input images and deciding whether they come from a human face or, for example, from printed photos held in front of the system’s camera by an imposter. The main challenge for de- veloping a robust face anti-spooﬁng system is the large number of different types of presentation attacks the system must learn to rec- ognize. For example, an imposter could be presenting to the face recognition system a printed photo, a screen displaying a still im- age, or a screen replaying a video. A multitude of other factors, such as the quality of the printed photo, the resolution and type of the displaying screen, the illumination conditions of the scene, and the characteristics of the system’s camera, may also have a sig- niﬁcant effect on the performance of any anti-spooﬁng algorithm. Moreover, a robust anti-spooﬁng algorithm should be able to cope with previously unseen attack methods, which were not anticipated prior to its deployment. Traditionally, face anti-spooﬁng is approached as a binary classi- ﬁcation problem and classiﬁers are trained on specialised datasets, containing both client and imposter images and videos. The main limitation of this approach is associated with the high cost of cre- ating such databases. That is, a limited only number of attacks is simulated, on a limited number of subjects, while the variability of important environmental factors such as illumination conditions and background is also limited. As a result, the classiﬁers do not always generalize well to previously unseen attacks. In this context, anomaly detection, using classiﬁers trained on a one class dataset of client images only, is becoming an increas- ingly popular approach to face anti-spooﬁng [AKC17][AK18]. The present work is motivated by the observation that training with client images only can also use in-the-wild face images, that is, a set of face images harvested online, as well as face images from databases that do not specialize in face-anti-spooﬁng. After giving a brief overview of the general literature on face anti-spooﬁng, in Section 2.2 we review the relevant literature on the use of anomaly detection for face anti-spooﬁng and establish our main observation. That is, in the existing literature, the training data are drawn from specialised face anti-spooﬁng databases, even though they are just common face images. In Sections 3 and 4, we describe a proof-of-concept experiment on the feasibility of an alternative approach to the creation of one- class training sets. In particular, we augment an initial training set of client images from specialised face anti-spooﬁng databases, ﬁrst with images from non-specialised databases, the SCFace [GDG11] and the CASIA-Web Face [YLLL14] in particular, and then with images from the in-the-wild, which were semi-automatically har- vested from online sources. © 2021 The Author(s) Eurographics Proceedings © 2021 The Eurographics Association. DOI: 10.2312/cgvc.20211312 https://diglib.eg.org https://www.eg.org EG UK Computer Graphics & Visual Computing (2021) K. Xu and M. Turner (Editors)