Hypothesis Testing, Synthesis

Slides:



Advertisements
Similar presentations
Panel at 2013 Joint Mathematics Meetings
Advertisements

STAT 101 Dr. Kari Lock Morgan
Introducing Hypothesis Tests
Hypothesis Testing: Intervals and Tests
INFERENCE: SIGNIFICANCE TESTS ABOUT HYPOTHESES Chapter 9.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Hypothesis Testing: Hypotheses
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Significance STAT 101 Dr. Kari Lock Morgan SECTION 4.3, 4.5 Significance level (4.3)
Significance Tests About
STAT E100 Exam 2 Review.
Stat 301 – Day 28 Review. Last Time - Handout (a) Make sure you discuss shape, center, and spread, and cite graphical and numerical evidence, in context.
Stat 301 – Day 14 Review. Previously Instead of sampling from a process  Each trick or treater makes a “random” choice of what item to select; Sarah.
Stat 350 Lab Session GSI: Yizao Wang Section 016 Mon 2pm30-4pm MH 444-D Section 043 Wed 2pm30-4pm MH 444-B.
Click on image for full.pdf article Links in article to access datasets.
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Stat 217 – Day 15 Statistical Inference (Topics 17 and 18)
Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 101 Dr. Kari Lock Morgan 9/25/12 SECTION 4.2 Randomization distribution.
Confidence Intervals and Hypothesis Tests
ANOVA 3/19/12 Mini Review of simulation versus formulas and theoretical distributions Analysis of Variance (ANOVA) to compare means: testing for a difference.
Hypothesis Testing III 2/15/12 Statistical significance Errors Power Significance and sample size Section 4.3 Professor Kari Lock Morgan Duke University.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Synthesis and Review 3/26/12 Multiple Comparisons Review of Concepts Review of Methods - Prezi Essential Synthesis 3 Professor Kari Lock Morgan Duke University.
Unit 7b Statistical Inference - 2 Hypothesis Testing Using Data to Make Decisions FPP Chapters 27, 27, possibly 27 &/or 29 Z-tests for means Z-tests.
June 19, 2008Stat Lecture 12 - Testing 21 Introduction to Inference More on Hypothesis Tests Statistics Lecture 12.
Testing Hypotheses about a Population Proportion Lecture 29 Sections 9.1 – 9.3 Tue, Oct 23, 2007.
More Randomization Distributions, Connections
Significance Testing 10/15/2013. Readings Chapter 3 Proposing Explanations, Framing Hypotheses, and Making Comparisons (Pollock) (pp ) Chapter 5.
STAT E100 Section Week 10 – Hypothesis testing, 1- Proportion, 2- Proportion – Z tests, 2- Sample T tests.
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 (?) Multiple explanatory variables.
Essential Synthesis SECTION 4.4, 4.5, ES A, ES B
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Using Lock5 Statistics: Unlocking the Power of Data
Introduction to Statistical Inference Probability & Statistics April 2014.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
Statistics: Unlocking the Power of Data Lock 5 Afternoon Session Using Lock5 Statistics: Unlocking the Power of Data Patti Frazer Lock University of Kentucky.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
Chapter 8 Statistical inference: Significance Tests About Hypotheses
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Cautions STAT 250 Dr. Kari Lock Morgan SECTION 4.3, 4.5 Type I and II errors (4.3) Statistical.
Inference after ANOVA, Multiple Comparisons 3/21/12 Inference after ANOVA The problem of multiple comparisons Bonferroni’s Correction Section 8.2 Professor.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 12/6/12 Synthesis Big Picture Essential Synthesis Bayesian Inference (continued)
Introduction Suppose that a pharmaceutical company is concerned that the mean potency  of an antibiotic meet the minimum government potency standards.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 250 Dr. Kari Lock Morgan SECTION 4.1 Hypothesis test Null and alternative.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Statistics: Unlocking the Power of Data Lock 5 Section 4.5 Confidence Intervals and Hypothesis Tests.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 2.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 Multiple explanatory variables (10.1,
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Hypothesis Tests for 1-Proportion Presentation 9.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 1.
Synthesis and Review for Exam 1
Unit 5: Hypothesis Testing
Measuring Evidence with p-values
Section 4.5 Making Connections.
Stat 217 – Day 28 Review Stat 217.
Stat 217 – Day 17 Review.
Significance Tests: The Basics
Section 11.1: Significance Tests: Basics
Presentation transcript:

Hypothesis Testing, Synthesis STAT 101 Dr. Kari Lock Morgan 10/4/12 Hypothesis Testing, Synthesis SECTION 4.5, Essential Synthesis B Connecting intervals and tests (4.5) Statistical versus practical significance (4.5) Multiple testing (4.5) Synthesis activities

Exam 1 Exam 1: Thursday 10/11 Open only to a calculator and one double sided page of notes prepared by you Emphasis on conceptual understanding

Practice Last year’s midterm, with solutions, are available on the course website (under documents) Review problems are posted for you to work through Doing problems is the key to success!!!

Keys to In-Class Exam Success Work lots of practice problems! Take last year’s exams under realistic conditions (time yourself, do it all before looking at the solutions, etc.) Prepare a good cheat sheet and use it when working problems Read the corresponding sections in the book if there are concepts you are still confused about

Office Hours Next Week Monday Heather 4 – 6pm, Old Chem 211A Sam, 6 – 9pm, Old Chem 211A Tuesday Kari 1:30 – 2:30 pm, Old Chem 216 Tracy 5 – 7 pm, Old Chem 211A Wednesday Kari 1 – 3pm, Old Chem 216 Tracy 4:30 – 5:30 pm, Old Chem 211A Heather 8 – 9pm, Old Chem 211A Thursday Kari 1 – 2:30 pm, Old Chem 216

Clickers Reminder: sharing clickers is a case of academic dishonesty and will be treated as such. If caught clicking in with two clickers, everyone involved will receive a 0 for their entire clicker grade (10% of the final grade) be reported to the dean to follow up regarding academic dishonesty

Body Temperature We created a bootstrap distribution for average body temperature by resampling with replacement from the original sample ( 𝑥 = 92.26):

Body Temperature We also created a randomization distribution to see if average body temperature differs from 98.6F by adding 0.34 to every value to make the null true, and then resampling with replacement from this modified sample:

Body Temperature These two distributions are identical (up to random variation from simulation to simulation) except for the center The bootstrap distribution is centered around the sample statistic, 98.26, while the randomization distribution is centered around the null hypothesized value, 98.6 The randomization distribution is equivalent to the bootstrap distribution, but shifted over

Body Temperature Bootstrap Distribution Randomization Distribution 98.26 98.6 Randomization Distribution H0:  = 98.6 Ha:  ≠ 98.6 Talk about the fact that the null hypothesized value is in the extremes of the bootstrap distribution, so the sample statistic is in the extremes of the randomization distribution

Body Temperature Bootstrap Distribution Randomization Distribution 98.26 98.4 Randomization Distribution H0:  = 98.4 Ha:  ≠ 98.4 Talk about the fact that the null hypothesized value is not in the extremes of the bootstrap distribution, so the sample statistic is not in the extremes of the randomization distribution

Intervals and Tests If a 95% CI contains the parameter in H0, then a two-tailed test should not reject H0 at a 5% significance level. If a 95% CI misses the parameter in H0, then a two-tailed test should reject H0 at a 5% significance level.

Intervals and Tests A confidence interval represents the range of plausible values for the population parameter If the null hypothesized value IS NOT within the CI, it is not a plausible value and should be rejected If the null hypothesized value IS within the CI, it is a plausible value and should not be rejected

Body Temperatures Using bootstrapping, we found a 95% confidence interval for the mean body temperature to be (98.05, 98.47) This does not contain 98.6, so at α = 0.05 we would reject H0 for the hypotheses H0 :  = 98.6 Ha :  ≠ 98.6

Both Father and Mother “Does a child need both a father and a mother to grow up happily?” Let p be the proportion of adults aged 18-29 in 2010 who say yes. A 95% CI for p is (0.487, 0.573). Testing H0: p = 0.5 vs Ha: p ≠ 0.5 with α = 0.05, we Reject H0 Do not reject H0 Reject Ha Do not reject Ha 0.5 is within the CI, so is a plausible value for p. http://www.pewsocialtrends.org/2011/03/09/for-millennials-parenthood-trumps-marriage/#fn-7199-1

Both Father and Mother “Does a child need both a father and a mother to grow up happily?” Let p be the proportion of adults aged 18-29 in 1997 who say yes. A 95% CI for p is (0.533, 0.607). Testing H0: p = 0.5 vs Ha: p ≠ 0.5 with α = 0.05, we Reject H0 Do not reject H0 Reject Ha Do not reject Ha 0.5 is not within the CI, so is not a plausible value for p. http://www.pewsocialtrends.org/2011/03/09/for-millennials-parenthood-trumps-marriage/#fn-7199-1

Intervals and Tests Confidence intervals are most useful when you want to estimate population parameters Hypothesis tests and p-values are most useful when you want to test hypotheses about population parameters Confidence intervals give you a range of plausible values; p-values quantify the strength of evidence against the null hypothesis

Interval, Test, or Neither? Is the following question best assessed using a confidence interval, a hypothesis test, or is statistical inference not relevant? On average, how much more do adults who played sports in high school exercise than adults who did not play sports in high school? Confidence interval Hypothesis test Statistical inference not relevant

Interval, Test, or Neither? Is the following question best assessed using a confidence interval, a hypothesis test, or is statistical inference not relevant? Do a majority of adults riding a bicycle wear a helmet? Confidence interval Hypothesis test Statistical inference not relevant

Interval, Test, or Neither? Is the following question best assessed using a confidence interval, a hypothesis test, or is statistical inference not relevant? On average, were the 23 players on the 2010 Canadian Olympic hockey team older than the 23 players on the 2010 US Olympic hockey team? Confidence interval Hypothesis test Statistical inference not relevant

Statistical vs Practical Significance With small sample sizes, even large differences or effects may not be significant With large sample sizes, even a very small difference or effect can be significant A statistically significant result is not always practically significant, especially with large sample sizes

Statistical vs Practical Significance Example: Suppose a weight loss program recruits 10,000 people for a randomized experiment. A difference in average weight loss of only 0.5 lbs could be found to be statistically significant Suppose the experiment lasted for a year. Is a loss of ½ a pound practically significant?

Diet and Sex of Baby Are certain foods in your diet associated with whether or not you conceive a boy or a girl? To study this, researchers asked women about their eating habits, including asking whether or not they ate 133 different foods regularly A significant difference was found for breakfast cereal (mothers of boys eat more), prompting the headline “Breakfast Cereal Boosts Chances of Conceiving Boys”. http://www.newscientist.com/article/dn13754-breakfast-cereals-boost-chances-of-conceiving-boys.html

“Breakfast Cereal Boosts Chances of Conceiving Boys” I’m pregnant (with identical twins!), and am very curious about whether I’m going to have boys or girls! I eat breakfast cereal every morning. Do you think this boosts my chances of having boys? yes no impossible to tell

Hypothesis Tests For each of the 133 foods studied, a hypothesis test was conducted for a difference between mothers who conceived boys and girls in the proportion who consume each food State the null and alternative hypotheses If there are NO differences (all null hypotheses are true), about how many significant differences would be found using α = 0.05? A significant difference was found for breakfast cereal (mothers of boys eat more), prompting the headline “Breakfast Cereal Boosts Chances of Conceiving Boys”. How might you explain this?

Hypothesis Tests State the null and alternative hypotheses If there are NO differences (all null hypotheses are true), about how many significant differences would be found using α = 0.05? A significant difference was found for breakfast cereal (mothers of boys eat more), prompting the headline “Breakfast Cereal Boosts Chances of Conceiving Boys”. How might you explain this? pb: proportion of mothers who have boys that consume the food regularly pg: proportion of mothers who have girls that consume the food regularly H0: pb = pg Ha: pb ≠ pg 133  0.05 = 6.65 Random chance; several tests (about 6 or 7) are going to be significant, even if no differences exist

Multiple Testing When multiple hypothesis tests are conducted, the chance that at least one test incorrectly rejects a true null hypothesis increases with the number of tests. If the null hypotheses are all true, α of the tests will yield statistically significant results just by random chance.

www.causeweb.org Author: JB Landers

Multiple Comparisons Consider a topic that is being investigated by research teams all over the world  Using α = 0.05, 5% of teams are going to find something significant, even if the null hypothesis is true

Multiple Comparisons Consider a research team/company doing many hypothesis tests Using α = 0.05, 5% of tests are going to be significant, even if the null hypotheses are all true

Multiple Comparisons This is a serious problem The most important thing is to be aware of this issue, and not to trust claims that are obviously one of many tests (unless they specifically mention an adjustment for multiple testing) There are ways to account for this (e.g. Bonferroni’s Correction), but these are beyond the scope of this class

Publication Bias publication bias refers to the fact that usually only the significant results get published The one study that turns out significant gets published, and no one knows about all the insignificant results This combined with the problem of multiple comparisons, can yield very misleading results

Jelly Beans Cause Acne! http://xkcd.com/882/ Consider having your students act this out in class, each reading aloud a different part. it’s very fun! http://xkcd.com/882/

http://xkcd.com/882/

Summary If a null hypothesized value lies inside a 95% CI, a two-tailed test using α = 0.05 would not reject H0 If a null hypothesized value lies outside a 95% CI, a two-tailed test using α = 0.05 would reject H0 Statistical significance is not always the same as practical significance Using α = 0.05, 5% of all hypothesis tests will lead to rejecting the null, even if all the null hypotheses are true

Synthesis You’ve now learned how to successfully collect and analyze data to answer a question! Let’s put that to use…

Exercise and Pulse Does just 5 seconds of exercise increase pulse rate? What are the cases and variables? Are they categorical or quantitative? Identify explanatory and response. Does the question imply causality? How would you collect data to answer it? Merge with 3 other groups to collect data. (check pulse rate) Visualize and summarize your data. Before doing any formal inference, take a guess at answering the question. Conduct a hypothesis test to answer the question. State your hypotheses, calculate the p-value, make a conclusion in context. How much does 5 seconds of exercise increase pulse rate by? State the parameter of interest and give and interpret a confidence interval.

What proportion of people can roll their tongue? Tongue Curling What proportion of people can roll their tongue? Can you roll your tongue? (a) Yes (b) No Visualize and summarize the data. What is your point estimate? Give and interpret a confidence interval. Tongue rolling has been said to be a dominant trait, in which case theoretically 75% of all people should be able to roll their tongues. Do our data provide evidence otherwise?

Tuesday Tuesday’s class with be a review session There will be no clicker questions and no new material, so attendance is optional I’ll spend the first half reviewing the key topics we’ve covered so far, and then will have open Q and A

To Do Read Essential Synthesis A, B Prepare for Exam 1 (Thursday, 10/11) Study Make page of notes for Exam 1 Do review problems Take practice exam Solutions under documents on course webpage