Presentation on theme: "Chapter 9 Creating and Maintaining Database Presented by Zhiming Liu Instructor: Dr. Bebis."— Presentation transcript:
Chapter 9 Creating and Maintaining Database Presented by Zhiming Liu Instructor: Dr. Bebis
Outline Introduction Enrollment Policies The Zoo Biometric Sample Quality Control Training Enrollment Is System Training
Introduction Biometric enrollment asks an individual to give out private information. Enrollment is a process directed by some enrollment policy, which needs to be acceptable to the public. Positive enrollment: under enrollment policy E M, select trusted individuals and store machine representation of these m enrolled members in a verification database M.
Introduction Negative enrollment: for criminal identification systems, under enrollment policy E N, determine the undesirable individuals and store machine representations of the n selected individuals in the screening database N. Because of error and fraud, there are fake and duplicate identities in legacy databases.
Introduction - A fake identity can be one of two cases, created and stolen identities: 1. Created identity: some subject d enrolls in M as d ’ K using documents for a nonexistent identity, either fake documents or fake ID. 2. Stolen identity: a fake identity can also be a falsely enrolled subject d ’ K as subject d K, the stolen identity. - A duplicate identity I B Subject A duplicate I A
Enrollment policies - Positive enrollment: this is a process of the registration of M trusted subjects d m in database M. The enrollment could be based on some already enrolled population W. - Negative enrollment: is a process of registration of N questionable subjects d n by storing machine descriptions of these subjects in database N, which contains much more specific and detailed descriptions.
Enrollment policies Social issues - How to make biometric authentication work without creating additional security loopholes, and without damaging civil liberties? - Who will administer and maintain databases of authorized subjects? - How will the data integrity of these databases be protected?
The zoo Apply animals to subject categories, depend on whether one subject is easy to authenticate or not. - Sheep: The group of subjects that dominate the population are easy to authenticate because their real-world biometric is very distinctive and stable. - Goats: The group of subjects that are particularly difficult to authenticate because of a poor real-world biometric that is not distinctive, perhaps due to physical damage to body parts or due to large spurious variability in the biometric measurements over time. This is the portion of the population that generates the majority of False Rejects.
The zoo - Lambs: These are the enrolled subjects who are easy to imitate. Lambs are the cause of most False Accepts because they are imitated by wolves. - Wolves: These are subjects that are particularly good at imitating, impersonating, or forging a particular biometric. - Chameleons: These are the subjects who are both easy to imitate and good at imitating others. They are a source of passive False Accepts when enrolled and of active False Accepts when being authenticated.
Biometric sample quality control Many random False Rejects/Accepts occur because of adverse signal acquisition situations. - two solutions
Biometric sample quality control - for example, apply image enhancement or suggest subjects present the biometric in a different, “better” way. - Failure to Enroll (FTE) Input quality control higher FTE rates Low-quality samples lower FTE rates - Relationship with ROC lower FTE higher FAR and FRR
Biometric sample quality control
Training Why does a biometric system need to be trained? - Compute match score s(B’, B). - The goal is to make the average difference between these match scores and mismatch scores as high as possible. There are two aspects to training - Enrollment policies and authentication protocols
Training 1. Enrollment of subjects: During enrollment one or more samples B of a subject’s biometric β are acquired and biometric samples or templates derived from the samples B are stored in some database M. 2. Protocols: A biometric authentication system itself needs to be trained, by refining and enhancing the signal or image to match the user population characteristics and incrementally improving the match engine.
Enrollment is system training Build database M by selecting subjects d from the world population W and assigning an identifier ID to each subject.
Enrollment is system training Three possibilities: 1. Correctly “linked”, ID = k 2. Subject d k is in reality a subject d j, with j < k, i.e., d k is “duplicate” of subject d j. As a result, ID j and ID k are duplicates, representing the same individual. 3. Subject d k is in reality a subject d j, with j > k, i.e., d k is faking unenrolled subject d j. As a result, ID k corresponds to a “fake” identity.
Enrollment is system training We have non-zero probabilities - P D is the probability that some subject d M is also enrolled under a different ID number - P F is the probability that subject d M is a fake identity Database integrity - Integrity: how well the database reflects the truth data of the seed documents (birth certification, proofs of citizenship, and passports) used for enrollment
Enrollment is system training The database integrity when it comes to duplicates is determined by P D, the probability of duplicates - P DEA (Double Enroll Attack) refers to the probability that an already enrolled subject d j wishes to re-enroll in the database as a different identity d k. - FNMR E is the probability that a match between two samples of the same biometric is not detected, i.e., is missed. - The number of duplicates in M is P D * m, with m the number of entities in M
Enrollment is system training The enrollment integrity is further determined by P F, the probability of a fake enroll as d k - FMR E is the probability that a match between two different biometric samples is falsely declared during enrollment - P IA is the probability of impersonation attack - The number of fake identities in M equals P F * m
Enrollment is system training Probabilistic enrollment - build an access control list of subjects d i, i = 1,…,m of some database M. - association between d i and the corresponding biometric β i - compute likelihood it expresses how well a subject’s biometric β i match his template B i - probability can only be computed if there exist some machine representation of real word biometrics β i, let these representations be another set of templates and write
Enrollment is system training where, for simplicity, we assume that the match score is the likelihood that d i is the true subject, given B i Modeling the world - Prob (d i | B i ) can be approximated by match score s i only under very unrealistic circumstances. - more realistic approximations will have to involve the modeling of other subjects d k enrolled in M, more generally, compute Prob (d i |O) the likelihood of subject d i given the biometric data O collected at enrollment time
Enrollment is system training - Prob (O) is the prior probability that this particular observation will occur (which cannot be computed exactly) - assume Prob (d i ) = P d is constant - evaluate Prob (O|d i ) is a matter of fitting model d i to the data O and determine how well this can be done. - evaluating the rest of this expression Prob (O|d k ) k = j+1,…, m is impossible, because these subjects are not available upon d j enrollment
Enrollment is system training Modeling the rest of the world — cohorts - the most difficult issue in training a biometric authentication system is the modeling of data from unknown people. - voice verification methods not only use a model describing the speaker’s biometric machine representation, but also a model describing all other speakers. - two techniques to approximate the denominator of (9.7)
Enrollment is system training - reduce the set M to one fictitious model subject D, trained on a pool of data from many different speakers, who represent the “world” W of possible speakers. - factor, so that the denominator reflects the whole population D + d i 1. World modeling
Enrollment is system training - approximate the set M by a subset M i that resemble subject d i. for each subject d i, a set of approximate forgeries is computed and stored. We denote this set by D i — the set is called the set of cohorts of speaker i. - factor i = c i, the number of cohorts for d i 2. Cohort modeling
Enrollment is system training Updating the probabilities - denote Prob (d i |O) with P i - during operation of the authentication system, data from subjects is collected and likelihood P i could be updated. - upon authentication of subject d i, a biometric sample is acquired that we denote here as O. - compute Prob (d i |O, O)
Enrollment is system training - what needs to be evaluated is the denominator Prob ( O) - set Prob (d i ) = P i