Presentation on theme: "Hypothesis Testing, Synthesis"— Presentation transcript:
1Hypothesis Testing, Synthesis STAT 101Dr. Kari Lock Morgan10/4/12Hypothesis Testing, SynthesisSECTION 4.5, Essential Synthesis BConnecting intervals and tests (4.5)Statistical versus practical significance (4.5)Multiple testing (4.5)Synthesis activities
2Exam 1Exam 1: Thursday 10/11Open only to a calculator and one double sided page of notes prepared by youEmphasis on conceptual understanding
3PracticeLast year’s midterm, with solutions, are available on the course website (under documents)Review problems are posted for you to work throughDoing problems is the key to success!!!
4Keys to In-Class Exam Success Work lots of practice problems!Take last year’s exams under realistic conditions (time yourself, do it all before looking at the solutions, etc.)Prepare a good cheat sheet and use it when working problemsRead the corresponding sections in the book if there are concepts you are still confused about
5Office Hours Next Week Monday Heather 4 – 6pm, Old Chem 211A Sam, 6 – 9pm, Old Chem 211ATuesdayKari 1:30 – 2:30 pm, Old Chem 216Tracy 5 – 7 pm, Old Chem 211AWednesdayKari 1 – 3pm, Old Chem 216Tracy 4:30 – 5:30 pm, Old Chem 211AHeather 8 – 9pm, Old Chem 211AThursdayKari 1 – 2:30 pm, Old Chem 216
6ClickersReminder: sharing clickers is a case of academic dishonesty and will be treated as such.If caught clicking in with two clickers, everyone involved willreceive a 0 for their entire clicker grade (10% of the final grade)be reported to the dean to follow up regarding academic dishonesty
7Body TemperatureWe created a bootstrap distribution for average body temperature by resampling with replacement from the original sample ( 𝑥 = 92.26):
8Body TemperatureWe also created a randomization distribution to see if average body temperature differs from 98.6F by adding 0.34 to every value to make the null true, and then resampling with replacement from this modified sample:
9Body TemperatureThese two distributions are identical (up to random variation from simulation to simulation) except for the centerThe bootstrap distribution is centered around the sample statistic, 98.26, while the randomization distribution is centered around the null hypothesized value, 98.6The randomization distribution is equivalent to the bootstrap distribution, but shifted over
10Body Temperature Bootstrap Distribution Randomization Distribution 98.2698.6Randomization DistributionH0: = 98.6Ha: ≠ 98.6Talk about the fact that the null hypothesized value is in the extremes of the bootstrap distribution, so the sample statistic is in the extremes of the randomization distribution
11Body Temperature Bootstrap Distribution Randomization Distribution 98.2698.4Randomization DistributionH0: = 98.4Ha: ≠ 98.4Talk about the fact that the null hypothesized value is not in the extremes of the bootstrap distribution, so the sample statistic is not in the extremes of the randomization distribution
12Intervals and TestsIf a 95% CI contains the parameter in H0, then a two-tailed test should not reject H0 at a 5% significance level.If a 95% CI misses the parameter in H0, then a two-tailed test should reject H0 at a 5% significance level.
13Intervals and TestsA confidence interval represents the range of plausible values for the population parameterIf the null hypothesized value IS NOT within the CI, it is not a plausible value and should be rejectedIf the null hypothesized value IS within the CI, it is a plausible value and should not be rejected
14Body TemperaturesUsing bootstrapping, we found a 95% confidence interval for the mean body temperature to be (98.05, 98.47)This does not contain 98.6, so at α = 0.05 we would reject H0 for the hypothesesH0 : = 98.6Ha : ≠ 98.6
15Both Father and Mother“Does a child need both a father and a mother to grow up happily?”Let p be the proportion of adults aged in who say yes. A 95% CI for p is (0.487, 0.573).Testing H0: p = 0.5 vs Ha: p ≠ 0.5 with α = 0.05, weReject H0Do not reject H0Reject HaDo not reject Ha0.5 is within the CI, so is a plausible value for p.
16Both Father and Mother“Does a child need both a father and a mother to grow up happily?”Let p be the proportion of adults aged in who say yes. A 95% CI for p is (0.533, 0.607).Testing H0: p = 0.5 vs Ha: p ≠ 0.5 with α = 0.05, weReject H0Do not reject H0Reject HaDo not reject Ha0.5 is not within the CI, so is not a plausible value for p.
17Intervals and TestsConfidence intervals are most useful when you want to estimate population parametersHypothesis tests and p-values are most useful when you want to test hypotheses about population parametersConfidence intervals give you a range of plausible values; p-values quantify the strength of evidence against the null hypothesis
18Interval, Test, or Neither? Is the following question best assessed using a confidence interval, a hypothesis test, or is statistical inference not relevant? On average, how much more do adults who played sports in high school exercise than adults who did not play sports in high school?Confidence intervalHypothesis testStatistical inference not relevant
19Interval, Test, or Neither? Is the following question best assessed using a confidence interval, a hypothesis test, or is statistical inference not relevant? Do a majority of adults riding a bicycle wear a helmet?Confidence intervalHypothesis testStatistical inference not relevant
20Interval, Test, or Neither? Is the following question best assessed using a confidence interval, a hypothesis test, or is statistical inference not relevant? On average, were the 23 players on the 2010 Canadian Olympic hockey team older than the 23 players on the 2010 US Olympic hockey team?Confidence intervalHypothesis testStatistical inference not relevant
21Statistical vs Practical Significance With small sample sizes, even large differences or effects may not be significantWith large sample sizes, even a very small difference or effect can be significantA statistically significant result is not always practically significant, especially with large sample sizes
22Statistical vs Practical Significance Example: Suppose a weight loss program recruits 10,000 people for a randomized experiment.A difference in average weight loss of only 0.5 lbs could be found to be statistically significantSuppose the experiment lasted for a year. Is a loss of ½ a pound practically significant?
23Diet and Sex of BabyAre certain foods in your diet associated with whether or not you conceive a boy or a girl?To study this, researchers asked women about their eating habits, including asking whether or not they ate 133 different foods regularlyA significant difference was found for breakfast cereal (mothers of boys eat more), prompting the headline “Breakfast Cereal Boosts Chances of Conceiving Boys”.
24“Breakfast Cereal Boosts Chances of Conceiving Boys” I’m pregnant (with identical twins!), and am very curious about whether I’m going to have boys or girls! I eat breakfast cereal every morning. Do you think this boosts my chances of having boys?yesnoimpossible to tell
25Hypothesis TestsFor each of the 133 foods studied, a hypothesis test was conducted for a difference between mothers who conceived boys and girls in the proportion who consume each foodState the null and alternative hypothesesIf there are NO differences (all null hypotheses are true), about how many significant differences would be found using α = 0.05?A significant difference was found for breakfast cereal (mothers of boys eat more), prompting the headline “Breakfast Cereal Boosts Chances of Conceiving Boys”. How might you explain this?
26Hypothesis Tests State the null and alternative hypotheses If there are NO differences (all null hypotheses are true), about how many significant differences would be found using α = 0.05?A significant difference was found for breakfast cereal (mothers of boys eat more), prompting the headline “Breakfast Cereal Boosts Chances of Conceiving Boys”. How might you explain this?pb: proportion of mothers who have boys that consume the food regularlypg: proportion of mothers who have girls that consume the food regularlyH0: pb = pgHa: pb ≠ pg133 0.05 = 6.65Random chance; several tests (about 6 or 7) are going to be significant, even if no differences exist
27Multiple TestingWhen multiple hypothesis tests are conducted, the chance that at least one test incorrectly rejects a true null hypothesis increases with the number of tests. If the null hypotheses are all true, α of the tests will yield statistically significant results just by random chance.
29Multiple ComparisonsConsider a topic that is being investigated by research teams all over the world Using α = 0.05, 5% of teams are going to find something significant, even if the null hypothesis is true
30Multiple ComparisonsConsider a research team/company doing many hypothesis testsUsing α = 0.05, 5% of tests are going to be significant, even if the null hypotheses are all true
31Multiple Comparisons This is a serious problem The most important thing is to be aware of this issue, and not to trust claims that are obviously one of many tests (unless they specifically mention an adjustment for multiple testing)There are ways to account for this (e.g. Bonferroni’s Correction), but these are beyond the scope of this class
32Publication Biaspublication bias refers to the fact that usually only the significant results get publishedThe one study that turns out significant gets published, and no one knows about all the insignificant resultsThis combined with the problem of multiple comparisons, can yield very misleading results
33Jelly Beans Cause Acne! http://xkcd.com/882/ Consider having your students act this out in class, each reading aloud a different part. it’s very fun!
37SummaryIf a null hypothesized value lies inside a 95% CI, a two-tailed test using α = 0.05 would not reject H0If a null hypothesized value lies outside a 95% CI, a two-tailed test using α = 0.05 would reject H0Statistical significance is not always the same as practical significanceUsing α = 0.05, 5% of all hypothesis tests will lead to rejecting the null, even if all the null hypotheses are true
38SynthesisYou’ve now learned how to successfully collect and analyze data to answer a question!Let’s put that to use…
39Exercise and PulseDoes just 5 seconds of exercise increase pulse rate?What are the cases and variables? Are they categorical or quantitative? Identify explanatory and response.Does the question imply causality? How would you collect data to answer it?Merge with 3 other groups to collect data. (check pulse rate)Visualize and summarize your data. Before doing any formal inference, take a guess at answering the question.Conduct a hypothesis test to answer the question. State your hypotheses, calculate the p-value, make a conclusion in context.How much does 5 seconds of exercise increase pulse rate by? State the parameter of interest and give and interpret a confidence interval.
40What proportion of people can roll their tongue? Tongue CurlingWhat proportion of people can roll their tongue?Can you roll your tongue? (a) Yes (b) NoVisualize and summarize the data. What is your point estimate?Give and interpret a confidence interval.Tongue rolling has been said to be a dominant trait, in which case theoretically 75% of all people should be able to roll their tongues. Do our data provide evidence otherwise?
41Tuesday Tuesday’s class with be a review session There will be no clicker questions and no new material, so attendance is optionalI’ll spend the first half reviewing the key topics we’ve covered so far, and then will have open Q and A
42To Do Read Essential Synthesis A, B Prepare for Exam 1 (Thursday, 10/11)StudyMake page of notes for Exam 1Do review problemsTake practice examSolutions under documents on course webpage