Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Slides:



Advertisements
Similar presentations
Evaluating Classifiers
Advertisements

Sampling: Final and Initial Sample Size Determination
Statistics: Purpose, Approach, Method. The Basic Approach The basic principle behind the use of statistical tests of significance can be stated as: Compare.
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Evaluation.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
Evaluating Classifiers Lecture 2 Instructor: Max Welling Read chapter 5.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Evaluation.
Evaluating Hypotheses
Evaluation and Credibility How much should we believe in what was learned?
Experimental Evaluation
Evaluation and Credibility
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Evaluation of Learning Models
Review of normal distribution. Exercise Solution.
Computer Vision Lecture 8 Performance Evaluation.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
CLassification TESTING Testing classifier accuracy
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Credibility: Evaluating what’s been learned This Lecture based on Ch 5 of Witten & Frank Plan for this week 3 classes before Midterm Paper and Survey discussion.
Manijeh Keshtgary Chapter 13.  How to report the performance as a single number? Is specifying the mean the correct way?  How to report the variability.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
CHAPTER SEVEN ESTIMATION. 7.1 A Point Estimate: A point estimate of some population parameter is a single value of a statistic (parameter space). For.
CpSc 881: Machine Learning Evaluating Hypotheses.
Confidence Intervals (Dr. Monticino). Assignment Sheet  Read Chapter 21  Assignment # 14 (Due Monday May 2 nd )  Chapter 21 Exercise Set A: 1,2,3,7.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
Machine Learning Chapter 5. Evaluating Hypotheses
1 Evaluation of Learning Models Literature: Literature: T. Mitchel, Machine Learning, chapter 5 T. Mitchel, Machine Learning, chapter 5 I.H. Witten and.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages
Machine Learning 5. Parametric Methods.
Sampling Theory and Some Important Sampling Distributions.
Validation methods.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Classification Vikram Pudi IIIT Hyderabad.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Credibility: Evaluating What’s Been Learned Predicting.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Sampling and Sampling Distribution
Data Science Credibility: Evaluating What’s Been Learned
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
ESTIMATION.
Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
9. Credibility: Evaluating What’s Been Learned
Evaluating Classifiers
Data Mining Practical Machine Learning Tools and Techniques
Machine Learning Techniques for Data Mining
Presentation transcript:

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall

Rodney Nielsen, Human Intelligence & Language Technologies Lab Credibility: Evaluating What’s Been Learned ● Issues: training, testing, tuning ● Predicting performance ● Holdout, cross-validation, bootstrap ● Comparing schemes: the t-test ● Predicting probabilities: loss functions ● Cost-sensitive measures ● Evaluating numeric prediction ● The Minimum Description Length principle

Rodney Nielsen, Human Intelligence & Language Technologies Lab Credibility: Evaluating What’s Been Learned ● Issues: training, testing, tuning ● Predicting performance ● Holdout, cross-validation, bootstrap ● Comparing schemes: the t-test ● Predicting probabilities: loss functions ● Cost-sensitive measures ● Evaluating numeric prediction ● The Minimum Description Length principle

Rodney Nielsen, Human Intelligence & Language Technologies Lab Evaluation: The Key to Success ● How predictive is the model we learned? ● Error on the training data is not a good indicator of performance on future data  Otherwise 1-NN would be the optimum classifier! ● Simple solution that can be used if lots of (labeled) data is available:  Split data into training and test set ● However: (labeled) data is usually limited  More sophisticated techniques need to be used

Rodney Nielsen, Human Intelligence & Language Technologies Lab Issues in Evaluation ● Statistical reliability of estimated differences in performance (  significance tests) ● Choice of performance measure:  Number of correct classifications  Accuracy of probability estimates  Error in numeric predictions ● Costs assigned to different types of errors  Many practical applications involve costs

Rodney Nielsen, Human Intelligence & Language Technologies Lab Training and Testing I ● Natural performance measure for classification problems: error rate  Error: instance’s class is predicted incorrectly  Error rate: proportion of errors made over the whole set of instances Under what conditions might this not be a very good indicator of performance?

Rodney Nielsen, Human Intelligence & Language Technologies Lab Training and Testing II ● Resubstitution error: error rate obtained from training data ● Resubstitution error is (hopelessly) optimistic! ● Why?

Rodney Nielsen, Human Intelligence & Language Technologies Lab Training and Testing III ● Test set: independent instances that have played no part in formation of classifier ● Assumption: both training data and test data are representative samples of the underlying problem: i.i.d. data (independent and identically distributed) ● Test and training data may differ in nature ● Example: classifiers built using customer data from two different towns A and B ● To estimate performance of classifier from town A in completely new town, test it on data from B

Rodney Nielsen, Human Intelligence & Language Technologies Lab Note on Parameter Tuning ● It is important that the test data is not used in any way to create the classifier ● Some learning schemes operate in two stages: ● Stage 1: build the basic structure ● Stage 2: optimize parameter settings ● The test data can’t be used for parameter tuning! ● Proper procedure uses three sets: training data, validation data, and test data ● Validation data is used to optimize parameters

Rodney Nielsen, Human Intelligence & Language Technologies Lab Making the Most of the Data ● Once evaluation is complete, all the data can be used to build the final classifier ● Generally, the larger the training data the better the classifier (but returns diminish) ● The larger the test data the more accurate the error estimate ● Holdout procedure: method of splitting original data into training and test set ● Dilemma: ideally both training set and test set should be large!

Rodney Nielsen, Human Intelligence & Language Technologies Lab Credibility: Evaluating What’s Been Learned ● Issues: training, testing, tuning ● Predicting performance ● Holdout, cross-validation, bootstrap ● Comparing schemes: the t-test ● Predicting probabilities: loss functions ● Cost-sensitive measures ● Evaluating numeric prediction ● The Minimum Description Length principle

Rodney Nielsen, Human Intelligence & Language Technologies Lab Predicting Performance ● Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount of test data ● Prediction is just like tossing a (biased!) coin  “Heads” is a “success”, “tails” is an “error” ● In statistics, a succession of independent events like this is called a Bernoulli process  Statistical theory provides us with confidence intervals for the true underlying proportion

Rodney Nielsen, Human Intelligence & Language Technologies Lab Confidence Intervals ● We can say: p lies within a certain specified interval with a certain specified confidence ● Example: S=750 successes in N=1000 trials ● Estimated success rate: 75% ● How close is this to true success rate p? ● Answer: with 80%confidence p in [73.2,76.7] ● Another example: S=75 and N=100 ● Estimated success rate: 75% ● How close is this to true success rate p? ● With 80% confidence p in [69.1,80.1]

Rodney Nielsen, Human Intelligence & Language Technologies Lab Mean and Variance ● Mean and variance for a Bernoulli trial: p, p (1–p) ● Expected success rate f=S/N ● Mean and variance for f : p, p (1–p)/N ● For large enough N, f follows an approximate Normal distribution ● c% confidence interval [–z  X  z] for random variable with 0 mean is given by: ● With a symmetric distribution:

Rodney Nielsen, Human Intelligence & Language Technologies Lab Confidence Limits ● Confidence limits for the normal distribution with 0 mean and a standard deviation of 1: ● Thus: ● To use this we have to reduce our random variable f to have 0 mean and unit standard deviation % % 1.655% % z 1.0% 0.5% 0.1% Pr[X  z]

Rodney Nielsen, Human Intelligence & Language Technologies Lab Examples ● f = 75%, N = 1000, c = 80% (i.e., z = 1.28): ● f = 75%, N = 100, c = 80% : ● Note that normal distribution assumption is only valid for large N (maybe, N > 100) ● f = 75%, N = 10, c = 80% : (should be taken with a grain of salt)

Rodney Nielsen, Human Intelligence & Language Technologies Lab Credibility: Evaluating What’s Been Learned ● Issues: training, testing, tuning ● Predicting performance ● Holdout, cross-validation, bootstrap ● Comparing schemes: the t-test ● Predicting probabilities: loss functions ● Cost-sensitive measures ● Evaluating numeric prediction ● The Minimum Description Length principle