Independent Performance Evaluation of Biometric Systems Davrondzhon Gafurov, Bian Yang, Patrick Bours and Christoph Busch Gjøvik University College, {Firstname.Lastname}@hig.no I. I NTRODUCTION Often in the development of a biometric product the evaluator of the system is the same institution who developed the algorithm. Furthermore, usually the test data set is also collected by the same developer/evaluator and in most cases such database will not be public. Consequently, test results cannot be veriﬁed by independent institutions. Although this can be justiﬁable (e.g. in the optimization phase of an algorithm), from the perspective of a potential customer it reduces trustworthiness of the developed system and reported performances. Therefore, for performance evalua- tion the availability of independent databases and desirably independent evaluators are very important. It is also essential that algorithm developers do not have access to the testing database and thus the risk of tuned algorithms is minimized. Pseudonymous identiﬁers (PI) are complementary to image- or minutiae- based references and provide a level that is both more privacy protective and more efﬁcient than symmetric or asymmetric encryption of a biometric reference image or minutiae template record [1]. With PI an individual retains complete control of its biometric data as multiple PI can be generated from a single biometric characteristic without any risk that these can be linked together. At the same time any of these identiﬁers can be cancelled and replaced by a new one if needed. Research on such PI is a core objective of the TURBINE project [2]. This work presents the biometric performance test report on a ﬁngerprint performance evaluation that has been gener- ated in the context of the TURBINE project [2]. It is worth to emphasize that this performance report is results of the ﬁrst testing round which are not ﬁnal results of the project. For the second and ﬁnal testing round, project partners will submit their improved algorithms, and its results will be available in year 2011. Furthermore, here we only report the ”biometric performance” per se of the algorithms while the ”security performance” of the PI algorithms is evaluated by others in TURBINE. II. PERFORMANCE METRICS AND DATA SETS The main error types associated with any biometric perfor- mance are FMR versus FNMR, and FAR versus FRR. In this work we will refer to the former and latter pairs as algorithm and system performances (or errors), respectively. They are related to each other according to the below formulas: F AR = FMR*(1-FTA),FRR = FTA+FNMR*(1-FTA) In our tests, we deﬁne FTA using formula below. FTA = FTC + FTX * (1 - FTC) where FTC (Failure To Capture) and FTX (Failure To eXtract) are estimated as follow FTC = (# terminated capture attempts) + (# not sufﬁcient quality images) Total # capture attempts FTX = ( # not encoded templates) Total # images submitted to the template encoder (”#” stands for ”number of”). It is worth noting that FTA computation incorporates both hardware (in FTC) and software (in FTX) related failures. The (binary) ﬁngerprint veriﬁcation algorithms are pro- vided by project partners, in particular Sagem Securite, Precise Biometrics, Philips Research Europe and University of Twente. An external ﬁngerprint veriﬁcation package by Neurotechnology (VeriFinger [3]) is bought and also in- cluded in the testing. The submitted PI software for the ﬁrst benchmark encompasses software which simulates the effect of a physical protection layer obtained when imple- menting encoding and comparison within a smartcard (on- card-comparison techniques). As a test database we use the GUC100 multi-scanner ﬁngerprint database which consisted of ﬁngerprint images of all 10 ﬁngers from 100 subjects (almost 72000 ﬁngerprint images in total) [4]. Neither project partners nor external parties had access to the GUC100 database or were involved in the testing activity. Performance evaluations were carried out solely by the GUC research team as an independent and neutral academic party in the project. III. PERFORMANCE RESULTS The focus of this work is not on comparing individual performances of algorithms or scanners but rather empha- sizing characteristics of biometric performance evaluation and observing the potential performance degradation in the transition to the PI level. Therefore, the names of scanner (denoted as S1, ..., S6) and algorithm suppliers (except Neurotechnology) are anonymized. Test results are given in terms of the DET-curves. At the minutiae level curves x-axis are plotted in logarithmic scale.