# Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

## Presentation on theme: "Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing."— Presentation transcript:

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing Assessing and Comparing Performance

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Questions:  Assessment of the expected error of a learning algorithm: Is the error rate of 1-NN less than 2%?  Comparing the expected errors of two algorithms: Is k-NN more accurate than MLP ? Training/validation/test sets Resampling methods: K-fold cross-validation

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 Algorithm Preference Criteria (Application-dependent):  Misclassification error, or risk (loss functions)  Training time/space complexity  Testing time/space complexity  Interpretability  Easy programmability Cost-sensitive learning

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing Assessing and Comparing Performance

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 Resampling and K-Fold Cross-Validation The need for multiple training/validation sets {X i,V i } i : Training/validation sets of fold i K-fold cross-validation: Divide X into k, X i,i=1,...,K T i share K-2 parts

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 5×2 Cross-Validation 5 times 2 fold cross-validation (Dietterich, 1998)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 Bootstrapping Draw instances from a dataset with replacement Prob that we do not pick an instance after N draws that is, only 36.8% is new!

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing Assessing and Comparing Performance

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9 Measuring Error Error rate = # of errors / # of instances = (FN+FP) / N Recall = # of found positives / # of positives = TP / (TP+FN) = sensitivity = hit rate Precision = # of found positives / # of found = TP / (TP+FP) Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity

Methods for Performance Evaluation How to obtain a reliable estimate of performance? Performance of a model may depend on other factors besides the learning algorithm:  Class distribution  Cost of misclassification  Size of training and test sets

Learning Curve l Learning curve shows how accuracy changes with varying sample size l Requires a sampling schedule for creating learning curve: l Arithmetic sampling (Langley, et al) l Geometric sampling (Provost et al) Effect of small sample size: - Bias in the estimate - Variance of estimate

ROC (Receiver Operating Characteristic) Developed in 1950s for signal detection theory to analyze noisy signals  Characterize the trade-off between positive hits and false alarms ROC curve plots TP (on the y-axis) against FP (on the x-axis) Performance of each classifier represented as a point on the ROC curve  changing the threshold of algorithm, sample distribution or cost matrix changes the location of the point http://en.wikipedia.org/wiki/Receiver_operating_characteristic http://www.childrensmercy.org/stats/ask/roc.asp

ROC Curve At threshold t: TP=0.5, FN=0.5, FP=0.12, FN=0.88 - 1-dimensional data set containing 2 classes (positive and negative) - any points located at x > t is classified as positive

ROC Curve (TP,FP): (0,0): declare everything to be negative class (1,1): declare everything to be positive class (1,0): ideal Diagonal line:  Random guessing  Below diagonal line: prediction is opposite of the true class

Using ROC for Model Comparison l No model consistently outperform the other l M 1 is better for small FPR l M 2 is better for large FPR l Area Under the ROC curve l Ideal:  Area = 1 l Random guess:  Area = 0.5

How to Construct an ROC curve InstanceP(+|A)True Class 10.95+ 20.93+ 30.87- 40.85- 5 - 6 + 70.76- 80.53+ 90.43- 100.25+ Use classifier that produces posterior probability for each test instance P(+|A) Sort the instances according to P(+|A) in decreasing order Apply threshold at each unique value of P(+|A) Count the number of TP, FP, TN, FN at each threshold TP rate, TPR = TP/(TP+FN) FP rate, FPR = FP/(FP + TN)

How to construct an ROC curve Threshold >= ROC Curve: + + - + - - - + - + +  Reverse of above order

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing Assessing and Comparing Performance

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19 Interval Estimation X = { x t } t where x t ~ N ( μ, σ 2 ) m ~ N ( μ, σ 2 /N) 100(1- α) percent confidence interval

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 When σ 2 is not known:

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21 Hypothesis Testing Reject a null hypothesis if not supported by the sample with enough confidence X = { x t } t where x t ~ N ( μ, σ 2 ) H 0 : μ = μ 0 vs. H 1 : μ ≠ μ 0 Accept H 0 with level of significance α if μ 0 is in the 100(1- α ) confidence interval Two-sided test

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22 One-sided test: H 0 : μ ≤ μ 0 vs. H 1 : μ > μ 0 Accept if Variance unknown: Use t, instead of z Accept H 0 : μ = μ 0 if

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing Assessing and Comparing Performance

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24 Assessing Error: H 0 : p ≤ p 0 vs. H 1 : p > p 0 Single training/validation set: Binomial Test If error prob is p 0, prob that there are e errors or less in N validation trials is 1- α Accept if this prob is less than 1- α N=100, e=20

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25 Normal Approximation to the Binomial Number of errors X is approx N with mean Np 0 and var Np 0 (1-p 0 ) Accept if this prob for X = e is less than z 1- α 1- α

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 26 Paired t Test Multiple training/validation sets x t i = 1 if instance t misclassified on fold i Error rate of fold i: With m and s 2 average and var of p i we accept p 0 or less error if is less than t α,K-1

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 27 K-Fold CV Paired t Test Use K-fold cv to get K training/validation folds p i 1, p i 2 : Errors of classifiers 1 and 2 on fold i p i = p i 1 – p i 2 : Paired difference on fold i The null hypothesis is whether p i has mean 0

Download ppt "Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing."

Similar presentations