Information content of biometric features Andy Adler, Richard Youmaran, Sergey Loyka University of Ottawa, Ontario, Canada {adler,youmaran,lokya}@site.uOttawa.ca 1. Introduction How much information is there in a face, or a fingerprint? In this paper we elaborate an approach to address this question based on information theory reasoning. In order to motivate our approach, we initially consider the properties that such a measure should have. Consider a soft biometric system which measures height and weight; furthermore, assume all humans are uniformly and independently distributed in height between 100–200 cm and weight between 100–200 lb. If a person’s features were completely stable and could be measured with infinite accuracy, people could be uniquely identified from these measurements, and the biometric features could be considered to yield infinite information. However, in reality, repeated biometric measurements give different results due to measurement inaccuracies, and to short- and long-term changes in the biometric features themselves. If this variability results in an uncertainty of ± 5 cm and ±5 lb, one simple model would be to round each measure measure to 105, 115, ..., 195. In this case, there are 10 × 10 equiprobable outcomes, and an information content of log 2 (100) = 6.6 bits. Such an analysis is intrinsically tied to a choice of biometric features. Thus, it does not appear possible to answer “how much information is in a fingerprint?”, but only “how much information is in the position and angle data of fingerprint minutiae?”. Furthermore, for many biometrics, it is not clear what the underlying features are. Face images, for example, can be described by image basis features or landmark based features. To overcome this, we may choose to calculate the information in all possible features. In the example, we may provide height in inches as well as cm; however, in this case, a good measure of information must not increase with such redundant data. Additionally, the following issues associated with biometric features must be considered: • Feature values are correlated. In the example given, taller people tend to be heavier. • Feature distributions vary. Features, such as minutiae ridge angles may be uniformly distributed over 0–2π, while other features may be better modeled as Gaussian. • Raw sample images need to be processed by alignment and scaling before features can be measured. • Feature dimensionality may not be constant. For example, the number of available minutiae points varies. • Feature space may not be bounded, linear or metric. 2. Relative Entropy of Biometric Measures Given these considerations, it appears that the most appropriate information theoretic measure for the information content of a biometric feature is the relative entropy (D(p‖q)) [3] between the intra- (q(x)) and inter-person (p(x)) biometric feature distributions, defined as D(p‖q)=  p(x)log 2 (p(x)/q(x)) dx. This measure can be motivated as follows: the entropy, H(p), is the amount of information required on average to describe features x distributed as p(x). However, H is not in itself an appropriate measure, since it includes information to describe a feature measurement more accurately than the intra-person distribution. On the other hand, the relative entropy, D, is the extra information required to describe a distribution p based on an assumed distribution q [3]. This corresponds nicely to the requirements: given a knowledge of the general, inter-person distribution, the information in a biometric feature set allows us to describe a particular individual. If p and q are modeled as Gaussian with means μ p , μ q and covariances Σ p , Σ q , then the relative entropy in bits is D(p||q)= 1 2 log 2  |Σ q | |Σ p | e trace  Σp+(μ p -μ q ) t (μ p -μ q )  Σ -1 q -I   (1) Using this measure we can calculate the entropy of a feature representation of face images.