Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Slides:



Advertisements
Similar presentations
Evaluating Classifiers
Advertisements

Statistical Significance and Performance Measures
Learning Algorithm Evaluation
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Lecture Notes for Chapter 4 (2) Introduction to Data Mining
Lecture Notes for Chapter 4 Part III Introduction to Data Mining
1 Machine Learning Support Vector Machines. 2 Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes.
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Model Evaluation Metrics for Performance Evaluation
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
© sebastian thrun, CMU, The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University
Evaluation.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland, College Park.
Experimental Evaluation
INTRODUCTION TO Machine Learning 3rd Edition
1 Business Intelligence and Data Analytics Intro Qiang Yang Based on Textbook: Business Intelligence by Carlos Vercellis.
Evaluation of Learning Models
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Today Evaluation Measures Accuracy Significance Testing
Classification II (continued) Model Evaluation
Evaluating Classifiers
CSE 4705 Artificial Intelligence
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Evaluation – next steps
Error estimation Data Mining II Year Lluís Belanche Alfredo Vellido.
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Experiments in Machine Learning COMP24111 lecture 5 Accuracy (%) A BC D Learning algorithm.
Experimental Evaluation of Learning Algorithms Part 1.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Supervised Learning Approaches Bayesian Learning Neural Network Support Vector Machine Ensemble Methods Adapted from Lecture Notes of V. Kumar and E. Alpaydin.
Evaluating Results of Learning Blaž Zupan
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
1 Data Mining Lecture 4: Decision Tree & Model Evaluation.
Evaluating Predictive Models Niels Peek Department of Medical Informatics Academic Medical Center University of Amsterdam.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluation of Learning Models Evgueni Smirnov. Overview Motivation Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing Data.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
7. Performance Measurement
Evaluating Classifiers
Evaluation – next steps
Lecture Notes for Chapter 4 Introduction to Data Mining
Performance Evaluation 02/15/17
CSE 4705 Artificial Intelligence
Evaluating Results of Learning
Performance Measures II
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining Classification: Alternative Techniques
Evaluation and Its Methods
Experiments in Machine Learning
Learning Algorithm Evaluation
INTRODUCTION TO Machine Learning
آبان 96. آبان 96 Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan,
Model Evaluation and Selection
Evaluation and Its Methods
Evaluation and Its Methods
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
Presentation transcript:

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing Assessing and Comparing Performance

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Questions:  Assessment of the expected error of a learning algorithm: Is the error rate of 1-NN less than 2%?  Comparing the expected errors of two algorithms: Is k-NN more accurate than MLP ? Training/validation/test sets Resampling methods: K-fold cross-validation

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 Algorithm Preference Criteria (Application-dependent):  Misclassification error, or risk (loss functions)  Training time/space complexity  Testing time/space complexity  Interpretability  Easy programmability Cost-sensitive learning

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing Assessing and Comparing Performance

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 Resampling and K-Fold Cross-Validation The need for multiple training/validation sets {X i,V i } i : Training/validation sets of fold i K-fold cross-validation: Divide X into k, X i,i=1,...,K T i share K-2 parts

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 5×2 Cross-Validation 5 times 2 fold cross-validation (Dietterich, 1998)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 Bootstrapping Draw instances from a dataset with replacement Prob that we do not pick an instance after N draws that is, only 36.8% is new!

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing Assessing and Comparing Performance

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9 Measuring Error Error rate = # of errors / # of instances = (FN+FP) / N Recall = # of found positives / # of positives = TP / (TP+FN) = sensitivity = hit rate Precision = # of found positives / # of found = TP / (TP+FP) Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity

Methods for Performance Evaluation How to obtain a reliable estimate of performance? Performance of a model may depend on other factors besides the learning algorithm:  Class distribution  Cost of misclassification  Size of training and test sets

Learning Curve l Learning curve shows how accuracy changes with varying sample size l Requires a sampling schedule for creating learning curve: l Arithmetic sampling (Langley, et al) l Geometric sampling (Provost et al) Effect of small sample size: - Bias in the estimate - Variance of estimate

ROC (Receiver Operating Characteristic) Developed in 1950s for signal detection theory to analyze noisy signals  Characterize the trade-off between positive hits and false alarms ROC curve plots TP (on the y-axis) against FP (on the x-axis) Performance of each classifier represented as a point on the ROC curve  changing the threshold of algorithm, sample distribution or cost matrix changes the location of the point

ROC Curve At threshold t: TP=0.5, FN=0.5, FP=0.12, FN= dimensional data set containing 2 classes (positive and negative) - any points located at x > t is classified as positive

ROC Curve (TP,FP): (0,0): declare everything to be negative class (1,1): declare everything to be positive class (1,0): ideal Diagonal line:  Random guessing  Below diagonal line: prediction is opposite of the true class

Using ROC for Model Comparison l No model consistently outperform the other l M 1 is better for small FPR l M 2 is better for large FPR l Area Under the ROC curve l Ideal:  Area = 1 l Random guess:  Area = 0.5

How to Construct an ROC curve InstanceP(+|A)True Class Use classifier that produces posterior probability for each test instance P(+|A) Sort the instances according to P(+|A) in decreasing order Apply threshold at each unique value of P(+|A) Count the number of TP, FP, TN, FN at each threshold TP rate, TPR = TP/(TP+FN) FP rate, FPR = FP/(FP + TN)

How to construct an ROC curve Threshold >= ROC Curve:  Reverse of above order

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing Assessing and Comparing Performance

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19 Interval Estimation X = { x t } t where x t ~ N ( μ, σ 2 ) m ~ N ( μ, σ 2 /N) 100(1- α) percent confidence interval

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 When σ 2 is not known:

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21 Hypothesis Testing Reject a null hypothesis if not supported by the sample with enough confidence X = { x t } t where x t ~ N ( μ, σ 2 ) H 0 : μ = μ 0 vs. H 1 : μ ≠ μ 0 Accept H 0 with level of significance α if μ 0 is in the 100(1- α ) confidence interval Two-sided test

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22 One-sided test: H 0 : μ ≤ μ 0 vs. H 1 : μ > μ 0 Accept if Variance unknown: Use t, instead of z Accept H 0 : μ = μ 0 if

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing Assessing and Comparing Performance

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24 Assessing Error: H 0 : p ≤ p 0 vs. H 1 : p > p 0 Single training/validation set: Binomial Test If error prob is p 0, prob that there are e errors or less in N validation trials is 1- α Accept if this prob is less than 1- α N=100, e=20

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25 Normal Approximation to the Binomial Number of errors X is approx N with mean Np 0 and var Np 0 (1-p 0 ) Accept if this prob for X = e is less than z 1- α 1- α

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 26 Paired t Test Multiple training/validation sets x t i = 1 if instance t misclassified on fold i Error rate of fold i: With m and s 2 average and var of p i we accept p 0 or less error if is less than t α,K-1

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 27 K-Fold CV Paired t Test Use K-fold cv to get K training/validation folds p i 1, p i 2 : Errors of classifiers 1 and 2 on fold i p i = p i 1 – p i 2 : Paired difference on fold i The null hypothesis is whether p i has mean 0