Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Hypothesis testing Another judgment method of sampling data.
Chapter 9 Hypothesis Testing Understandable Statistics Ninth Edition
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
AP Statistics – Chapter 9 Test Review
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
Chapter Seventeen HYPOTHESIS TESTING
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
Hypothesis : Statement about a parameter Hypothesis testing : decision making procedure about the hypothesis Null hypothesis : the main hypothesis H 0.
Topic 2: Statistical Concepts and Market Returns
Hypothesis Testing for Population Means and Proportions
Analysis of Variance Chapter 3Design & Analysis of Experiments 7E 2009 Montgomery 1.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Experimental Evaluation
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
Chapter 11: Inference for Distributions
Inferences About Process Quality
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Statistics 270– Lecture 25. Cautions about Z-Tests Data must be a random sample Outliers can distort results Shape of the population distribution matters.
5-3 Inference on the Means of Two Populations, Variances Unknown
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
Choosing Statistical Procedures
AM Recitation 2/10/11.
1 © Lecture note 3 Hypothesis Testing MAKE HYPOTHESIS ©
Statistical inference: confidence intervals and hypothesis testing.
Comparing Systems Using Sample Data Andy Wang CIS Computer Systems Performance Analysis.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Hypothesis Testing. Steps for Hypothesis Testing Fig Draw Marketing Research Conclusion Formulate H 0 and H 1 Select Appropriate Test Choose Level.
Chapter 9: Testing Hypotheses
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Confidence intervals and hypothesis testing Petter Mostad
Testing Hypothesis That Data Fit a Given Probability Distribution Problem: We have a sample of size n. Determine if the data fits a probability distribution.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
1 9 Tests of Hypotheses for a Single Sample. © John Wiley & Sons, Inc. Applied Statistics and Probability for Engineers, by Montgomery and Runger. 9-1.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Chapter 9: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Type I and II Errors Testing the difference between two means.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Logic and Vocabulary of Hypothesis Tests Chapter 13.
CHAPTERS HYPOTHESIS TESTING, AND DETERMINING AND INTERPRETING BETWEEN TWO VARIABLES.
STEP BY STEP Critical Value Approach to Hypothesis Testing 1- State H o and H 1 2- Choose level of significance, α Choose the sample size, n 3- Determine.
© Copyright McGraw-Hill 2004
- We have samples for each of two conditions. We provide an answer for “Are the two sample means significantly different from each other, or could both.
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
Ch8.2 Ch8.2 Population Mean Test Case I: A Normal Population With Known Null hypothesis: Test statistic value: Alternative Hypothesis Rejection Region.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Comparing Systems Using Sample Data Andy Wang CIS Computer Systems Performance Analysis.
Major Steps. 1.State the hypotheses.  Be sure to state both the null hypothesis and the alternative hypothesis, and identify which is the claim. H0H0.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Hypothesis Testing. Steps for Hypothesis Testing Fig Draw Marketing Research Conclusion Formulate H 0 and H 1 Select Appropriate Test Choose Level.
Part Four ANALYSIS AND PRESENTATION OF DATA
Hypothesis Testing: Hypotheses
When we free ourselves of desire,
Chapter 9 Hypothesis Testing.
P-value Approach for Test Conclusion
Discrete Event Simulation - 4
Presentation transcript:

Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh

Overview How can we tell if one algorithm can learn better than another? – Design an experiment to measure the accuracy of the two algorithms. – Run multiple trials. – Compare the samples - not just their means: Do a statistically sound test of the two samples. – Is any observed difference significant? Is it due to true difference between algorithms or natural variation in the measurements?

Statistical Hypothesis Testing Statistical Hypothesis: A statement about the parameters of one or more populations Hypothesis Testing: A procedure for deciding to accept or reject the hypothesis – Identify the parameter of interest – State a null hypothesis, H 0 – Specify an alternate hypothesis, H 1 – Choose a significance level α – State an appropriate test statistic

Statistical Hypothesis Testing Cont Null Hypothesis (H 0 ): A statement presumed to be true until statistical evidence shows otherwise Usually specifies an exact value for a parameter Example H 0 : µ = 30 Kg Alternate Hypothesis (H 1 ): Accepted if the null hypothesis is rejected Test Statistic: Particular statistic calculated from measurements of a random sample / experiment – A test statistic is assumed to follow a particular distribution (normal, t, chi-square, etc) – That particular distribution can be used to test for the significance of the calculated test statistic.

Error in Hypothesis Testing Type I error occurs when H 0 is rejected but it is in fact true – P( Type I error )=α or significance level Type II error occurs when we fail to reject H 0 but it is in fact false – P( Type II error )= β – power = 1-β = Probability of correctly rejecting H 0 – power = ability to distinguish between the two populations

Paired t-Test Collect data in pairs: – Example: Given a training set D Train and a test set D Test, train both learning algorithms on D Train and then test their accuracies on D Test. Suppose n paired measurements have been made Assume – The measurements are independent – The measurements for each algorithm follow a normal distribution The test statistic T 0 will follow a t-distribution with n-1 degrees of freedom

Paired t-Test cont Trial # Algorithm 1 Accuracy X 1 Algorithm 2 Accuracy X 2 1X 11 X 21 2X 12 X 22 …..… nX 1N X 2N Null Hypothesis: H 0 : µ D = Δ 0 Test Statistic: Assume: X 1 follows N(µ 1,σ 1 ) X 2 follows N(µ 2,σ 2 ) Let:µ D = µ 1 - µ 2 D i = X 1i - X 2i i=1,2,...,n Rejection Criteria: H 1 : µ D ≠ Δ 0 |t 0 | > t α/2,n-1 H 1 : µ D > Δ 0 t 0 > t α,n-1 H 1 : µ D < Δ 0 t 0 < -t α,n-1

Cross Validated t-test Paired t-Test on the 10 paired accuracies obtained from 10-fold cross validation Advantages – Large train set size – Most powerful (Diettrich, 98) Disadvantages – Accuracy results are not independent (overlap) – Somewhat elevated probability of type-1 error (Diettrich, 98) …

5x2 Cross Validated t-test Run 2-fold cross validation 5 times Use results from the first of five replications to estimate mean difference Use results for all folds to estimate the variance Advantage: – Lowest Type-1 error (Diettrich, 98) Disadvantage – Not as powerful as 10 fold cross validated t-test (Diettrich, 98)

Re-sampled t-test Randomly divide data into train / test sets (usually 2/3 – 1/3) Run multiple trials (usually 30) Perform a paired t-test between the trial accuracies This test has very high probability of type-1 error and should never be used.

Calibrated Tests Bouckaert – ICML 2003: – It is very difficult to estimate the true degrees of freedom because independence assumptions are being violated – Instead of correcting for the mean-difference, calibrate on the degrees of freedom – Recommendation: use 10 times repeated 10-fold cross validation with 10 degrees of freedom

References R. R. Bouckaert. Choosing between two learning algorithms based on calibrated tests. ICML’03: PP T. G. Dietterich. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10:1895–1924, D. C. Montgomery et al. Engineering Statistics. 2nd Edition. Wiley Press. 2001