Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland, College Park.

Similar presentations


Presentation on theme: "Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland, College Park."— Presentation transcript:

1 Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland, College Park

2 Contents Error Estimation in Pattern Recognition Jain et al., “Statistical Pattern Recognition: A Review”, IEEE PAMI 2000 (Section 7 Error Estimation). Assessing and Comparing Algorithms Adrian Clark and Christine Clark, “Performance Characterization in Computer Vision: A Tutorial”. Receiver Operating Characteristic (ROC) curve Detection Error Trade-off (DET) curve Confusion Matrix McNemar’s test http://peipa.essex.ac.uk/benchmark/

3 Error Estimation in Pattern Recognition Reference - Jain et al., “Statistical Pattern Recognition: A Review”, IEEE PAMI 2000 (Section 7 Error Estimation). It is very difficult to obtain a closed-form expression for error rate P e. In practice, the error rate must be estimated from all the available samples split into training and test sets. Error estimate = percentage of misclassified test samples. Reliable error estimate – (1) Large sample size, (2) Independent training and test samples.

4 Error Estimation in Pattern Recognition The error estimate (function of the specific training and test sets used) is random variable. Given a classifier, t is # of misclassified test samples out of n.  The probability density function of t has a binomial distribution. The maximum-likelihood estimate, P e, of P e is given by P e = t / n, with E ( P e ) = P e and Var ( P e ) = P e ( 1 - P e )/ n. P e is a random variable  a confidence interval (shrink as n increases)

5 versions of cross- validation approach leave all in resampling based on the analogy population  sample sample  sample http://www.uvm.edu/~dhowell/StatPages/Resampling/Bootstrapping.html http://www.childrens-mercy.org/stats/ask/bootstrap.asp http://www.cnr.colostate.edu/class_info/fw663/bootstrap.pdf http://www.maths.unsw.edu.au/ForStudents/courses/math3811/lecture9.pdf

6 Error Estimation in Pattern Recognition Receiver Operating Characteristic (ROC) Curve  detailed later. ‘Reject Rate’: reject doubtful patterns near the decision boundary (low confidence). A well-known reject option is to reject a pattern if its maximum a posteriori probability is below a threshold. Trade-off between ‘reject rate’ and ‘error rate’.

7 Next seminar: Dimensionality Reduction/Manifold Learning ?

8 classification method

9

10

11 Assessing and Comparing Algorithms Reference: Adrian Clark and Christine Clark, “Performance Characterization in Computer Vision: A Tutorial”. http://peipa.essex.ac.uk/benchmark/tutorials/essex/tutorial.pdf The same training and test sets. Some standard sets – FERET, PETS. Simply to see which has the better success rate?  Not enough. A standard statistical test, McNemar’s test is required. Two types of testing: Technology evaluation: the response of an underlying generic algorithm to factors such as adjustment of its tuning parameters, noisy input date, etc. Application evaluation: how well an algorithm performs a particular task

12 Assessing and Comparing Algorithms Receiver Operating Characteristic (ROC) curve

13 Assessing and Comparing Algorithms Detection Error Trade-off (DET) curve - logarithmic scales on both axes - more spread out, easier to distinguish - close to linear

14 Assessing and Comparing Algorithms Detection Error Trade-off (DET) curve - Forensic applications: track down a suspect - High security applications: ATM machines - EER (equal error rate) - Comparisons of algorithms tend to be performed with a specific set of tuning parameter values (Running them with settings that correspond to the EER is probably the most sensible.)

15 Assessing and Comparing Algorithms Crossing ROC curves Comparisons of algorithms tend to be performed with a specific set of tuning parameter values (Running them with settings that correspond to the EER is probably the most sensible.)

16 Assessing and Comparing Algorithms Confusion Matrices

17 Assessing and Comparing Algorithms McNemar’s test An appropriate statistical test must take into account not only # of FP, etc. but also ‘# of tests’. (a form of chi-square test) http://www.zephryus.demon.co.uk/geography/resources/fieldwork/stats/chi.html http://www.isixsigma.com/dictionary/Chi_Square_Test-67.htm

18 Assessing and Comparing Algorithms McNemar’s test If # of tests > 30, the central limit theorem applies


Download ppt "Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland, College Park."

Similar presentations


Ads by Google