ETHEM ALPAYDIN © The MIT Press, 2010 Lecture Slides for.

Slides:



Advertisements
Similar presentations
Hypothesis testing Another judgment method of sampling data.
Advertisements

Evaluating Classifiers
Machine Learning Group University College Dublin Evaluation in Machine Learning Pádraig Cunningham.
CHAPTER 25: One-Way Analysis of Variance Comparing Several Means
Learning Algorithm Evaluation
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Dealing With Statistical Uncertainty Richard Mott Wellcome Trust Centre for Human Genetics.
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
© sebastian thrun, CMU, The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University
Evaluation.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Horng-Chyi HorngStatistics II41 Inference on the Mean of a Population - Variance Known H 0 :  =  0 H 0 :  =  0 H 1 :    0, where  0 is a specified.
Experimental Evaluation
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Chapter 9 Hypothesis Testing.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
BCOR 1020 Business Statistics
Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
INTRODUCTION TO Machine Learning 3rd Edition
Choosing Statistical Procedures
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
Evaluating Classifiers
AM Recitation 2/10/11.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
Experimental Evaluation of Learning Algorithms Part 1.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
4 Hypothesis & Testing. CHAPTER OUTLINE 4-1 STATISTICAL INFERENCE 4-2 POINT ESTIMATION 4-3 HYPOTHESIS TESTING Statistical Hypotheses Testing.
Supervised Learning Approaches Bayesian Learning Neural Network Support Vector Machine Ensemble Methods Adapted from Lecture Notes of V. Kumar and E. Alpaydin.
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Evaluating Results of Learning Blaž Zupan
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Machine Learning Chapter 5. Evaluating Hypotheses
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Lecture 4 Confidence Intervals. Lecture Summary Last lecture, we talked about summary statistics and how “good” they were in estimating the parameters.
Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Credibility: Evaluating What’s Been Learned Predicting.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Step 1: Specify a null hypothesis
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Hypothesis Theory PhD course.
Evaluation and Its Methods
Learning Algorithm Evaluation
Evaluating Hypotheses
INTRODUCTION TO Machine Learning
CHAPTER 6 Statistical Inference & Hypothesis Testing
Lecture 10/24/ Tests of Significance
Inference on the Mean of a Population -Variance Known
Evaluation and Its Methods
Foundations 2.
Evaluation and Its Methods
Rest of lecture 4 (Chapter 5: pg ) Statistical Inferences
Presentation transcript:

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for

Introduction Questions: Assessment of the expected error of a learning algorithm: Is the error rate of 1-NN less than 2%? Comparing the expected errors of two algorithms: Is k-NN more accurate than MLP ? Training/validation/test sets Resampling methods: K-fold cross-validation 3Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Algorithm Preference Criteria (Application-dependent): Misclassification error, or risk (loss functions) Training time/space complexity Testing time/space complexity Interpretability Easy programmability Cost-sensitive learning 4Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Factors and Response 5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Strategies of Experimentation 6 Response surface design for approximating and maximizing the response function in terms of the controllable factors Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Guidelines for ML experiments A. Aim of the study B. Selection of the response variable C. Choice of factors and levels D. Choice of experimental design E. Performing the experiment F. Statistical Analysis of the Data G. Conclusions and Recommendations 7Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

The need for multiple training/validation sets {X i,V i } i : Training/validation sets of fold i K-fold cross-validation: Divide X into k, X i,i=1,...,K T i share K-2 parts Resampling and K-Fold Cross-Validation 8Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

5×2 Cross-Validation 9 5 times 2 fold cross-validation (Dietterich, 1998) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Draw instances from a dataset with replacement Prob that we do not pick an instance after N draws that is, only 36.8% is new! Bootstrapping 10Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Measuring Error Error rate = # of errors / # of instances = (FN+FP) / N Recall = # of found positives / # of positives = TP / (TP+FN) = sensitivity = hit rate Precision = # of found positives / # of found = TP / (TP+FP) Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity 11Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

ROC Curve 12Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

13Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Precision and Recall 14Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Interval Estimation 15 X = { x t } t where x t ~ N ( μ, σ 2 ) m ~ N ( μ, σ 2 /N) 100(1- α) percent confidence interval Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

16 When σ 2 is not known: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Reject a null hypothesis if not supported by the sample with enough confidence X = { x t } t where x t ~ N ( μ, σ 2 ) H 0 : μ = μ 0 vs. H 1 : μ ≠ μ 0 Accept H 0 with level of significance α if μ 0 is in the 100(1- α) confidence interval Two-sided test Hypothesis Testing 17Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

One-sided test: H 0 : μ ≤ μ 0 vs. H 1 : μ > μ 0 Accept if Variance unknown: Use t, instead of z Accept H 0 : μ = μ 0 if 18Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Assessing Error: H 0 : p ≤ p 0 vs. H 1 : p > p 0 19 Single training/validation set: Binomial Test If error prob is p 0, prob that there are e errors or less in N validation trials is 1- α Accept if this prob is less than 1- α N=100, e=20 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Number of errors X is approx N with mean Np 0 and var Np 0 (1-p 0 ) Normal Approximation to the Binomial 20 Accept if this prob for X = e is less than z 1-α 1- α Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Multiple training/validation sets x t i = 1 if instance t misclassified on fold i Error rate of fold i: With m and s 2 average and var of p i, we accept p 0 or less error if is less than t α,K-1 t Test 21Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Comparing Classifiers: H 0 : μ 0 = μ 1 vs. H 1 : μ 0 ≠ μ 1 22 Single training/validation set: McNemar’s Test Under H 0, we expect e 01 = e 10 =(e 01 + e 10 )/2 Accept if < X 2 α,1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

K-Fold CV Paired t Test 23 Use K-fold cv to get K training/validation folds p i 1, p i 2 : Errors of classifiers 1 and 2 on fold i p i = p i 1 – p i 2 : Paired difference on fold i The null hypothesis is whether p i has mean 0 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

5×2 cv Paired t Test 24 Use 5×2 cv to get 2 folds of 5 tra/val replications (Dietterich, 1998) p i (j) : difference btw errors of 1 and 2 on fold j=1, 2 of replication i=1,...,5 Two-sided test: Accept H 0 : μ 0 = μ 1 if in (-t α/2,5, t α/2,5 ) One-sided test: Accept H 0 : μ 0 ≤ μ 1 if < t α,5 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

5×2 cv Paired F Test 25 Two-sided test: Accept H 0 : μ 0 = μ 1 if < F α,10,5 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Comparing L>2 Algorithms: Analysis of Variance (Anova) 26 Errors of L algorithms on K folds We construct two estimators to σ 2. One is valid if H 0 is true, the other is always valid. We reject H 0 if the two estimators disagree. Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

27Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

28Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

ANOVA table 29 If ANOVA rejects, we do pairwise posthoc tests Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Comparison over Multiple Datasets Comparing two algorithms: Sign test: Count how many times A beats B over N datasets, and check if this could have been by chance if A and B did have the same error rate Comparing multiple algorithms Kruskal-Wallis test: Calculate the average rank of all algorithms on N datasets, and check if these could have been by chance if they all had equal error If KW rejects, we do pairwise posthoc tests to find which ones have significant rank difference 30Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)