Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.

Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang OPLab @ NTUIM

Agenda  Introduction  Measures of performance  Measures of ensemble effectiveness  Combination Rules  Experimental Results  Conclusion 2

INTRODUCTION

Introduction  Multimodal biometrics is better  Fuse multiple biometric results  Fusion at matching level is easier 4

Introduction  Which biometric experts shall we choose?  How to evaluate ensemble effectiveness?  Which measure gives out the best result? 5

MEASURES OF PERFORMANCE

Measures of performance  Notation  E={E 1 …E j …E N }: a set of N experts  U={u i }: the set of users  s j : the set of all scores by E j for all user  s ij : the score by E j for a user u i  f j (u i ): function of E j produce s ij for u i  th: threshold; gen: genuine; imp: impostor 7

Measures of performance: Basic  False Rejection Rate(FRR) for expert E j :  False Acceptance Rate(FAR) for expert E j : 8

Measures of performance: Basic  p(s j |gen): E j score probability distribution to genuine users  p(s j |imp): E j score probability distribution to impostor users  Threshold(th) changes with the requirements of the application at hand 9

Measures of performance  Area under the ROC curve(AUC)  Equal error rate(ERR)  The “decidability” index d’ 10

Measures of performance 11

Measures of performance: AUC  Estimate AUC by Mann-Whitney statistics:  This formulation of AUC is also called the “probability of correct pair-wise ranking”, as it computes the probability P( > ) 12

Measures of performance: AUC   n + /n − : no. of genuine/imposter users  : score set by E j for genuine users  : score set by E j for impostor users 13

Measures of performance: AUC  Features of AUC estimated by WMW stat. :  Theoretically equivalent to the value by integrating ROC curve  Attain more reliable estimation of AUC in real cases(finite samples)  Divide all scores s ij into 2 sets: & 14

Measures of performance: EER  EER is the point of ROC curve where FAR and FRR are equal  The lower the value of EER, the better the performance of a biometric system 15

Measures of performance: d’  The d’ in the biometrics is to measure the separability of the distributions of genuine and impostor scores  16

Measures of performance: d’  μ gen /μ imp : mean of genuine/impostor score distribution  σ gen /σ imp : std. deviation of genuine/impostor score distribution  The larger the d’, the better the performance of a biometric system 17

MEASURES OF ENSEMBLE EFFECTIVENESS

Measures of ensemble effectiveness  4 measures for estimating effectiveness of ensemble of biometric experts: AUC, EER, d’, and Score Dissimilarity(SD) Index  But we must take the difference in performance among the experts into consideration 19

Measures of ensemble effectiveness  Generic, weighted and normalized performance measure(pm) formulation:  pm δ =μ pm ∙ (1−tanh(σ pm ))  For AUC: AUC δ =μ AUC ∙ (1−tanh(σ AUC ))  The higher the AUC average, the better the performances of an ensemble of experts 20

Measures of ensemble effectiveness  For ERR: ERR δ =μ ERR ∙ (1−tanh(σ ERR ))  The lower the ERR average, the better the performances of an ensemble of experts  For d’, consider the value of d’ that can be much larger than 1, use normalized D’=log b (1+d’) instead of d’, and base b=10 according to the values of d’ in experiments  Thus D’ δ =μ D’ ∙ (1−tanh(σ D’ )) is used 21

Measures of ensemble effectiveness: SD index  SD index is based on the WMW formulation of the AUC, and is designed to measure the amount of improvement in AUC of the combination of an ensemble of experts  SD index is a measure of the amount of AUC that can be “recovered” by exploiting the complementarity of the experts 22

Measures of ensemble effectiveness: SD index  Consider 2 experts E1 & E2, and all possible scores pairs, divide these pairs into 4 subsets S 00, S 10, S 01, S 11 : 23

Measures of ensemble effectiveness: SD index  AUC of E1 & E2 are listed below, where card(S uv ) is the cardinality of the subset S uv :  SD index is defined as: 24

Measures of ensemble effectiveness: SD index  The higher the value of SD, the higher the maximum AUC that could be obtained by the combined scores  But actual increments of AUC depends on the combination method, and high SDs usually related to low performance experts  Performance measure formulation for SD: SD δ =μ SD ∙ (1−tanh(σ SD )) 25

COMBINATION RULES

Combination Rules  Combination(Fusion) in this work is at the score level, as it is the most widely used and flexible combination level  Investigate the performance of 4 combination methods: mean rule, product rule, linear combination by LDA, and DSS  LDA & DSS require a training phase to estimate the parameters needed to perform the combination 27

Combination Rules: Mean Rule  The mean rule is applied directly to the matching scores produced by the set of N experts  28

Combination Rules: Product Rule  The product rule is applied directly to the matching scores produced by the set of N experts  29

Combination Rules: Linear Combination by LDA  Linear discriminant analysis(LDA) can be used to compute the weights of a linear combination of the scores  This rule is to attain a fused score with minimum within-class variations and maximum between-class variations  30

Combination Rules: Linear Combination by LDA   W t (W): transformation vector computed using a training set  S i : vector of the scores assigned to the user u i by all the experts  μ gen /μ imp : mean of genuine/impostor score distribution  S w : within-class scatter matrix 31

Combination Rules: DSS  Dynamic score selection(DSS) is to select one of the scores s ij available for each user u i, instead of fusing them into a new score  The ideal selector is based on the knowledge of the state of nature of each user: 32

Combination Rules: DSS  DSS selects the scores according estimation of the state of nature for each user, and the algorithm is based on quadratic discriminant classiﬁer (QDC)  For the estimation, a vector space is built where the vector components are the scores assigned to the user by the N experts 33

Combination Rules: DSS  Train a classiﬁer on this vector space by using a training set related to genuine and impostor users  Using the classifier to estimate the state of nature of the user  After getting the estimation of the state of nature of the user, select user’s score according to (5). 34

EXPERIMENTAL RESULTS

Experimental Results: Goal  Investigate the correlation between the measures of the effectiveness of the ensemble  Understand final performances achieved by the combined experts, and get the best measures 36

Experimental Results: Preparation  Scores source: 41 experts and 4 DBs from open category in 3rd Fingerprint Veriﬁcation Competition(FVC2004)  No. of scores: For each sensor and for each expert, a total of 7750 scores, attempts from gen./imp. users are 2800/4950  For LDA & DSS training, divide scores into 4 subsets, with 700 gen. and 1238 imp. each 37

Experimental Results: Process  No. of expert pairs: 13,120(41x40x2x4)  For each pair, compute the measures of effectiveness by AUC, EER, d’ and SD index  Combine the pairs using 4 combination rules, then compute related values of AUC and EER to show the performance  Use a graphical representation of the results of the experiments 38

Experimental Results: AUC δ plotted against AUC 39

Experimental Results: AUC δ plotted against AUC 40

Experimental Results: AUC δ plotted against AUC  According to graphs, AUC δ isn’t useful because no clear relationship with AUC of combination rules  High AUC δ attains high AUC, but lower AUC δ gets value in wide range  High AUC δ relates to high performance and similar behavior experts pair  Mean rule has best AUC δ 41

Experimental Results: AUC δ plotted against EER 42

Experimental Results: AUC δ plotted against EER 43

Experimental Results: AUC δ plotted against EER  AUC δ is uncorrelated with the EER too  Any value of AUC δ, the EER spans over a wide range of values  Can not predict the performance of the combination in terms of EER by AUC δ 44

Experimental Results: EER δ plotted against AUC 45

Experimental Results: EER δ plotted against AUC 46

Experimental Results: EER δ plotted against AUC  Behavior better than AUC δ, but still no clear relationship between EER δ and AUC  Mean rules has best result too 47

Experimental Results: EER δ plotted against EER 48

Experimental Results: EER δ plotted against EER 49

Experimental Results: EER δ plotted against EER  No correlation between EER δ and EER  Graphs from AUC δ against EER and EER δ against EER have similar results  So AUC and EER are not suitable to evaluate combination of experts, despite that they are widely used for unimodal biometric system 50

Experimental Results: D’ δ plotted against AUC 51

Experimental Results: D’ δ plotted against AUC 52

Experimental Results: D’ δ plotted against AUC  Higher values of D' δ guarantee smaller ranges of values of the performance of the combination  D' δ has higher and clearer correlation with performance of combination  Mean rule gets best result, and product rule is the worst 53

Experimental Results: D’ δ plotted against EER 54

Experimental Results: D’ δ plotted against EER 55

Experimental Results: D’ δ plotted against EER  D' δ has better correlation with EER too  D' δ is much better than AUC δ and EER δ  D' δ is a good measure to evaluate the effectiveness of candidate ensembles of biometric experts 56

Experimental Results: SD δ plotted against AUC 57

Experimental Results: SD δ plotted against AUC 58

Experimental Results: SD δ plotted against AUC  SD δ does have some correlation with AUC because SD is designed to predict max improvement in AUC by combining experts, but is still not clear enough  Small SD δ s guarantee large performance, especially for high performance experts pair, because higher the AUC of the individual experts, the smaller the complementarity 59

Experimental Results: SD δ plotted against EER 60

Experimental Results: SD δ plotted against EER 61

Experimental Results: SD δ plotted against EER  SDδ with EER isn’t as good as AUC  Result from product rule is still no good 62

CONCLUSION

Conclusion  To predict performance improvement, product rule exhibit worst, mean rule is best, and LDA & DSS not far from mean rule  Under mean rule, LDA & DSS have similar results  Performance of combined experts is not highly correlated with single one in general 64

Conclusion  The best measure of ensemble is D' δ, while AUC δ and ERR δ isn’t good enough, and SD δ performs like AUC δ  Based on above results, D' δ with mean rule tops any other pairs of measure and combination rule, and is the most suitable method to be the measure of ensemble effeectiveness 65

THANKS FOR LISTENING! It’s Q&A time!

Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.

Similar presentations

Presentation on theme: "Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.

Similar presentations

Presentation on theme: "Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM."— Presentation transcript:

Similar presentations

About project

Feedback