Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)

Similar presentations


Presentation on theme: "Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)"— Presentation transcript:

1 Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4) Connecting intervals and tests (4.5)

2 Statistics: Unlocking the Power of Data Lock 5 Connections Today we’ll make connections between…  Chapter 1: Data collection (random sampling?, random assignment?)  Chapter 2: Which statistic is appropriate, based on the variable(s)?  Chapter 3: Bootstrapping and confidence intervals  Chapter 4: Randomization distributions and hypothesis tests

3 Statistics: Unlocking the Power of Data Lock 5 Connections Today we’ll make connections between…  Chapter 1: Data collection (random sampling?, random assignment?)  Chapter 2: Which statistic is appropriate, based on the variable(s)?  Chapter 3: Bootstrapping and confidence intervals  Chapter 4: Randomization distributions and hypothesis tests

4 Statistics: Unlocking the Power of Data Lock 5 Exercise and Gender H 0 :  m =  f, H a :  m >  f How might we make the null true? One way (of many): Bootstrap from this modified sample In StatKey, the default randomization method is “reallocate groups”, but “Shift Groups” is also an option, and will do this

5 Statistics: Unlocking the Power of Data Lock 5 Exercise and Gender p-value = 0.095

6 Statistics: Unlocking the Power of Data Lock 5 Exercise and Gender The p-value is 0.095. Using α = 0.05, we conclude…. a) Males exercise more than females, on average b) Males do not exercise more than females, on average c) Nothing

7 Statistics: Unlocking the Power of Data Lock 5 Blood Pressure and Heart Rate H 0 :  = 0, H a :  < 0 Two variables have correlation 0 if they are not associated. We can “break the association” by randomly permuting/scrambling/shuffling one of the variables Each time we do this, we get a sample we might observe just by random chance, if there really is no correlation

8 Statistics: Unlocking the Power of Data Lock 5 Blood Pressure and Heart Rate p-value = 0.219 Even if blood pressure and heart rate are not correlated, we would see correlations this extreme about 22% of the time, just by random chance.

9 Statistics: Unlocking the Power of Data Lock 5 Randomization Distribution Paul the Octopus (Single proportion):  Flip a coin or roll a die Cocaine Addiction (randomized experiment):  Rerandomize cases to treatment groups, keeping response values fixed Body Temperature (single mean):  Shift to make H 0 true, then bootstrap Exercise and Gender (observational study):  Shift to make H 0 true, then bootstrap Blood Pressure and Heart Rate (correlation):  Randomly permute/scramble/shuffle one variable

10 Statistics: Unlocking the Power of Data Lock 5 Connections Today we’ll make connections between…  Chapter 1: Data collection (random sampling?, random assignment?)  Chapter 2: Which statistic is appropriate, based on the variable(s)?  Chapter 3: Bootstrapping and confidence intervals  Chapter 4: Randomization distributions and hypothesis tests

11 Statistics: Unlocking the Power of Data Lock 5 Body Temperature

12 Statistics: Unlocking the Power of Data Lock 5 Body Temperature We also created a randomization distribution to see if average body temperature differs from 98.6  F by adding 0.34 to every value to make the null true, and then resampling with replacement from this modified sample:

13 Statistics: Unlocking the Power of Data Lock 5 Body Temperature These two distributions are identical (up to random variation from simulation to simulation) except for the center The bootstrap distribution is centered around the sample statistic, 98.26, while the randomization distribution is centered around the null hypothesized value, 98.6 The randomization distribution is equivalent to the bootstrap distribution, but shifted over

14 Statistics: Unlocking the Power of Data Lock 5 Bootstrap and Randomization Distributions Bootstrap DistributionRandomization Distribution Our best guess at the distribution of sample statistics Our best guess at the distribution of sample statistics, if H 0 were true Centered around the observed sample statistic Centered around the null hypothesized value Simulate sampling from the population by resampling from the original sample Simulate samples assuming H 0 were true Big difference: a randomization distribution assumes H 0 is true, while a bootstrap distribution does not

15 Statistics: Unlocking the Power of Data Lock 5 Which Distribution?

16 Statistics: Unlocking the Power of Data Lock 5 Which Distribution? Intro stat students are surveyed, and we find that 152 out of 218 are female. Let p be the proportion of intro stat students at that university who are female. A bootstrap distribution is generated for a confidence interval for p, and a randomization distribution is generated to see if the data provide evidence that p > 1/2. Which distribution is the randomization distribution?

17 Statistics: Unlocking the Power of Data Lock 5 Connections Today we’ll make connections between…  Chapter 1: Data collection (random sampling?, random assignment?)  Chapter 2: Which statistic is appropriate, based on the variable(s)?  Chapter 3: Bootstrapping and confidence intervals  Chapter 4: Randomization distributions and hypothesis tests

18 Statistics: Unlocking the Power of Data Lock 5 Body Temperature Bootstrap Distribution Randomization Distribution H 0 :  = 98.6 H a :  ≠ 98.6 98.26 98.6

19 Statistics: Unlocking the Power of Data Lock 5 Body Temperature Bootstrap Distribution 98.26 98.4 Randomization Distribution H 0 :  = 98.4 H a :  ≠ 98.4

20 Statistics: Unlocking the Power of Data Lock 5 Intervals and Tests A confidence interval represents the range of plausible values for the population parameter If the null hypothesized value IS NOT within the CI, it is not a plausible value and should be rejected If the null hypothesized value IS within the CI, it is a plausible value and should not be rejected

21 Statistics: Unlocking the Power of Data Lock 5 Intervals and Tests If a 95% CI misses the parameter in H 0, then a two-tailed test should reject H 0 at a 5% significance level. If a 95% CI contains the parameter in H 0, then a two-tailed test should not reject H 0 at a 5% significance level.

22 Statistics: Unlocking the Power of Data Lock 5 Using bootstrapping, we found a 95% confidence interval for the mean body temperature to be (98.05 , 98.47  ) This does not contain 98.6 , so at α = 0.05 we would reject H 0 for the hypotheses H 0 :  = 98.6  H a :  ≠ 98.6  Body Temperatures

23 Statistics: Unlocking the Power of Data Lock 5 Both Father and Mother “Does a child need both a father and a mother to grow up happily?” Let p be the proportion of adults aged 18-29 in 2010 who say yes. A 95% CI for p is (0.487, 0.573). Testing H 0 : p = 0.5 vs H a : p ≠ 0.5 with α = 0.05, we a) Reject H 0 b) Do not reject H 0 c) Reject H a d) Do not reject H a http://www.pewsocialtrends.org/2011/03/09/for- millennials-parenthood-trumps-marriage/#fn-7199-1

24 Statistics: Unlocking the Power of Data Lock 5 Both Father and Mother “Does a child need both a father and a mother to grow up happily?” Let p be the proportion of adults aged 18-29 in 1997 who say yes. A 95% CI for p is (0.533, 0.607). Testing H 0 : p = 0.5 vs H a : p ≠ 0.5 with α = 0.05, we a) Reject H 0 b) Do not reject H 0 c) Reject H a d) Do not reject H a http://www.pewsocialtrends.org/2011/03/09/for- millennials-parenthood-trumps-marriage/#fn-7199-1

25 Statistics: Unlocking the Power of Data Lock 5 Intervals and Tests Confidence intervals are most useful when you want to estimate population parameters Hypothesis tests and p-values are most useful when you want to test hypotheses about population parameters Confidence intervals give you a range of plausible values; p-values quantify the strength of evidence against the null hypothesis

26 Statistics: Unlocking the Power of Data Lock 5 Interval, Test, or Neither? Is the following question best assessed using a confidence interval, a hypothesis test, or is statistical inference not relevant? On average, how much more do adults who played sports in high school exercise than adults who did not play sports in high school? a) Confidence interval b) Hypothesis test c) Statistical inference not relevant

27 Statistics: Unlocking the Power of Data Lock 5 Interval, Test, or Neither? Is the following question best assessed using a confidence interval, a hypothesis test, or is statistical inference not relevant? Do a majority of adults take a multivitamin each day? a) Confidence interval b) Hypothesis test c) Statistical inference not relevant

28 Statistics: Unlocking the Power of Data Lock 5 Interval, Test, or Neither? Is the following question best assessed using a confidence interval, a hypothesis test, or is statistical inference not relevant? Did the Penn State football team score more points in 2014 or 2013? a) Confidence interval b) Hypothesis test c) Statistical inference not relevant

29 Statistics: Unlocking the Power of Data Lock 5 Summary Using α = 0.05, 5% of all hypothesis tests will lead to rejecting the null, even if all the null hypotheses are true Randomization samples should be generated  Consistent with the null hypothesis  Using the observed data  Reflecting the way the data were collected If a null hypothesized value lies inside a 95% CI, a two-tailed test using α = 0.05 would not reject H 0 If a null hypothesized value lies outside a 95% CI, a two-tailed test using α = 0.05 would reject H 0

30 Statistics: Unlocking the Power of Data Lock 5 To Do Read Sections 4.4, 4.5 Do HW 4.5 (due Friday, 3/27)


Download ppt "Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)"

Similar presentations


Ads by Google