Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)

Slides:



Advertisements
Similar presentations
Introducing Hypothesis Tests
Advertisements

What Can We Do When Conditions Aren’t Met? Robin H. Lock, Burry Professor of Statistics St. Lawrence University BAPS at 2014 JSM Boston, August 2014.
Hypothesis Testing, Synthesis
Hypothesis Testing: Intervals and Tests
Bootstrap Distributions Or: How do we get a sense of a sampling distribution when we only have ONE sample?
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 Randomization distribution p-value.
Hypothesis Testing: Hypotheses
Statistics: Unlocking the Power of Data Lock 5 Inference Using Formulas STAT 101 Dr. Kari Lock Morgan Chapter 6 t-distribution Formulas for standard errors.
Comparing Two Groups’ Means or Proportions Independent Samples t-tests.
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
Stat 512 – Day 8 Tests of Significance (Ch. 6). Last Time Use random sampling to eliminate sampling errors Use caution to reduce nonsampling errors Use.
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
7/2/2015Basics of Significance Testing1 Chapter 15 Tests of Significance: The Basics.
Section 4.4 Creating Randomization Distributions.
5-3 Inference on the Means of Two Populations, Variances Unknown
Dr. Kari Lock Morgan Department of Statistics Penn State University Teaching the Common Core: Making Inferences and Justifying Conclusions ASA Webinar.
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 101 Dr. Kari Lock Morgan 9/25/12 SECTION 4.2 Randomization distribution.
Confidence Intervals and Hypothesis Tests
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Randomization Tests Dr. Kari Lock Morgan PSU /5/14.
1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Testing Hypotheses about a Population Proportion Lecture 29 Sections 9.1 – 9.3 Tue, Oct 23, 2007.
More Randomization Distributions, Connections
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Confidence Intervals: Bootstrap Distribution
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal.
Essential Synthesis SECTION 4.4, 4.5, ES A, ES B
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Using Lock5 Statistics: Unlocking the Power of Data
Introduction to Statistical Inference Probability & Statistics April 2014.
Comparing Two Population Means
Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)
Statistics: Unlocking the Power of Data Lock 5 Afternoon Session Using Lock5 Statistics: Unlocking the Power of Data Patti Frazer Lock University of Kentucky.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
Introducing Inference with Simulation Methods; Implementation at Duke University Kari Lock Morgan Department of Statistical Science, Duke University
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.2.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
The Practice of Statistics Third Edition Chapter 13: Comparing Two Population Parameters Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
Using Randomization Methods to Build Conceptual Understanding of Statistical Inference: Day 2 Lock, Lock, Lock Morgan, Lock, and Lock MAA Minicourse- Joint.
Confidence intervals are one of the two most common types of statistical inference. Use a confidence interval when your goal is to estimate a population.
Confidence Intervals: Bootstrap Distribution
S-012 Testing statistical hypotheses The CI approach The NHST approach.
Introducing Inference with Bootstrapping and Randomization Kari Lock Morgan Department of Statistical Science, Duke University with.
Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU /12/14.
Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Slide Slide 1 Section 8-4 Testing a Claim About a Mean:  Known.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 12/6/12 Synthesis Big Picture Essential Synthesis Bayesian Inference (continued)
Hypothesis Testing Errors. Hypothesis Testing Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 250 Dr. Kari Lock Morgan SECTION 4.1 Hypothesis test Null and alternative.
Statistics: Unlocking the Power of Data Lock 5 Section 4.2 Measuring Evidence with p-values.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Statistics: Unlocking the Power of Data Lock 5 Section 4.5 Confidence Intervals and Hypothesis Tests.
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 2.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution (5.1) Central limit theorem.
Statistics Nik Bobrovitz BHSc, MSc PhD Student University of Oxford December 2015
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Estimation: Confidence Intervals SECTION 3.2 Confidence Intervals (3.2)
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 p-value.
Statistics: Unlocking the Power of Data Lock 5 Section 4.1 Introducing Hypothesis Tests.
Section 4.5 Making Connections.
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4) Connecting intervals and tests (4.5)

Statistics: Unlocking the Power of Data Lock 5 Connections Today we’ll make connections between…  Chapter 1: Data collection (random sampling?, random assignment?)  Chapter 2: Which statistic is appropriate, based on the variable(s)?  Chapter 3: Bootstrapping and confidence intervals  Chapter 4: Randomization distributions and hypothesis tests

Statistics: Unlocking the Power of Data Lock 5 Connections Today we’ll make connections between…  Chapter 1: Data collection (random sampling?, random assignment?)  Chapter 2: Which statistic is appropriate, based on the variable(s)?  Chapter 3: Bootstrapping and confidence intervals  Chapter 4: Randomization distributions and hypothesis tests

Statistics: Unlocking the Power of Data Lock 5 Exercise and Gender H 0 :  m =  f, H a :  m >  f How might we make the null true? One way (of many): Bootstrap from this modified sample In StatKey, the default randomization method is “reallocate groups”, but “Shift Groups” is also an option, and will do this

Statistics: Unlocking the Power of Data Lock 5 Exercise and Gender p-value = 0.095

Statistics: Unlocking the Power of Data Lock 5 Exercise and Gender The p-value is Using α = 0.05, we conclude…. a) Males exercise more than females, on average b) Males do not exercise more than females, on average c) Nothing

Statistics: Unlocking the Power of Data Lock 5 Blood Pressure and Heart Rate H 0 :  = 0, H a :  < 0 Two variables have correlation 0 if they are not associated. We can “break the association” by randomly permuting/scrambling/shuffling one of the variables Each time we do this, we get a sample we might observe just by random chance, if there really is no correlation

Statistics: Unlocking the Power of Data Lock 5 Blood Pressure and Heart Rate p-value = Even if blood pressure and heart rate are not correlated, we would see correlations this extreme about 22% of the time, just by random chance.

Statistics: Unlocking the Power of Data Lock 5 Randomization Distribution Paul the Octopus (Single proportion):  Flip a coin or roll a die Cocaine Addiction (randomized experiment):  Rerandomize cases to treatment groups, keeping response values fixed Body Temperature (single mean):  Shift to make H 0 true, then bootstrap Exercise and Gender (observational study):  Shift to make H 0 true, then bootstrap Blood Pressure and Heart Rate (correlation):  Randomly permute/scramble/shuffle one variable

Statistics: Unlocking the Power of Data Lock 5 Connections Today we’ll make connections between…  Chapter 1: Data collection (random sampling?, random assignment?)  Chapter 2: Which statistic is appropriate, based on the variable(s)?  Chapter 3: Bootstrapping and confidence intervals  Chapter 4: Randomization distributions and hypothesis tests

Statistics: Unlocking the Power of Data Lock 5 Body Temperature

Statistics: Unlocking the Power of Data Lock 5 Body Temperature We also created a randomization distribution to see if average body temperature differs from 98.6  F by adding 0.34 to every value to make the null true, and then resampling with replacement from this modified sample:

Statistics: Unlocking the Power of Data Lock 5 Body Temperature These two distributions are identical (up to random variation from simulation to simulation) except for the center The bootstrap distribution is centered around the sample statistic, 98.26, while the randomization distribution is centered around the null hypothesized value, 98.6 The randomization distribution is equivalent to the bootstrap distribution, but shifted over

Statistics: Unlocking the Power of Data Lock 5 Bootstrap and Randomization Distributions Bootstrap DistributionRandomization Distribution Our best guess at the distribution of sample statistics Our best guess at the distribution of sample statistics, if H 0 were true Centered around the observed sample statistic Centered around the null hypothesized value Simulate sampling from the population by resampling from the original sample Simulate samples assuming H 0 were true Big difference: a randomization distribution assumes H 0 is true, while a bootstrap distribution does not

Statistics: Unlocking the Power of Data Lock 5 Which Distribution?

Statistics: Unlocking the Power of Data Lock 5 Which Distribution? Intro stat students are surveyed, and we find that 152 out of 218 are female. Let p be the proportion of intro stat students at that university who are female. A bootstrap distribution is generated for a confidence interval for p, and a randomization distribution is generated to see if the data provide evidence that p > 1/2. Which distribution is the randomization distribution?

Statistics: Unlocking the Power of Data Lock 5 Connections Today we’ll make connections between…  Chapter 1: Data collection (random sampling?, random assignment?)  Chapter 2: Which statistic is appropriate, based on the variable(s)?  Chapter 3: Bootstrapping and confidence intervals  Chapter 4: Randomization distributions and hypothesis tests

Statistics: Unlocking the Power of Data Lock 5 Body Temperature Bootstrap Distribution Randomization Distribution H 0 :  = 98.6 H a :  ≠

Statistics: Unlocking the Power of Data Lock 5 Body Temperature Bootstrap Distribution Randomization Distribution H 0 :  = 98.4 H a :  ≠ 98.4

Statistics: Unlocking the Power of Data Lock 5 Intervals and Tests A confidence interval represents the range of plausible values for the population parameter If the null hypothesized value IS NOT within the CI, it is not a plausible value and should be rejected If the null hypothesized value IS within the CI, it is a plausible value and should not be rejected

Statistics: Unlocking the Power of Data Lock 5 Intervals and Tests If a 95% CI misses the parameter in H 0, then a two-tailed test should reject H 0 at a 5% significance level. If a 95% CI contains the parameter in H 0, then a two-tailed test should not reject H 0 at a 5% significance level.

Statistics: Unlocking the Power of Data Lock 5 Using bootstrapping, we found a 95% confidence interval for the mean body temperature to be (98.05 ,  ) This does not contain 98.6 , so at α = 0.05 we would reject H 0 for the hypotheses H 0 :  = 98.6  H a :  ≠ 98.6  Body Temperatures

Statistics: Unlocking the Power of Data Lock 5 Both Father and Mother “Does a child need both a father and a mother to grow up happily?” Let p be the proportion of adults aged in 2010 who say yes. A 95% CI for p is (0.487, 0.573). Testing H 0 : p = 0.5 vs H a : p ≠ 0.5 with α = 0.05, we a) Reject H 0 b) Do not reject H 0 c) Reject H a d) Do not reject H a millennials-parenthood-trumps-marriage/#fn

Statistics: Unlocking the Power of Data Lock 5 Both Father and Mother “Does a child need both a father and a mother to grow up happily?” Let p be the proportion of adults aged in 1997 who say yes. A 95% CI for p is (0.533, 0.607). Testing H 0 : p = 0.5 vs H a : p ≠ 0.5 with α = 0.05, we a) Reject H 0 b) Do not reject H 0 c) Reject H a d) Do not reject H a millennials-parenthood-trumps-marriage/#fn

Statistics: Unlocking the Power of Data Lock 5 Intervals and Tests Confidence intervals are most useful when you want to estimate population parameters Hypothesis tests and p-values are most useful when you want to test hypotheses about population parameters Confidence intervals give you a range of plausible values; p-values quantify the strength of evidence against the null hypothesis

Statistics: Unlocking the Power of Data Lock 5 Interval, Test, or Neither? Is the following question best assessed using a confidence interval, a hypothesis test, or is statistical inference not relevant? On average, how much more do adults who played sports in high school exercise than adults who did not play sports in high school? a) Confidence interval b) Hypothesis test c) Statistical inference not relevant

Statistics: Unlocking the Power of Data Lock 5 Interval, Test, or Neither? Is the following question best assessed using a confidence interval, a hypothesis test, or is statistical inference not relevant? Do a majority of adults take a multivitamin each day? a) Confidence interval b) Hypothesis test c) Statistical inference not relevant

Statistics: Unlocking the Power of Data Lock 5 Interval, Test, or Neither? Is the following question best assessed using a confidence interval, a hypothesis test, or is statistical inference not relevant? Did the Penn State football team score more points in 2014 or 2013? a) Confidence interval b) Hypothesis test c) Statistical inference not relevant

Statistics: Unlocking the Power of Data Lock 5 Summary Using α = 0.05, 5% of all hypothesis tests will lead to rejecting the null, even if all the null hypotheses are true Randomization samples should be generated  Consistent with the null hypothesis  Using the observed data  Reflecting the way the data were collected If a null hypothesized value lies inside a 95% CI, a two-tailed test using α = 0.05 would not reject H 0 If a null hypothesized value lies outside a 95% CI, a two-tailed test using α = 0.05 would reject H 0

Statistics: Unlocking the Power of Data Lock 5 To Do Read Sections 4.4, 4.5 Do HW 4.5 (due Friday, 3/27)