Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 250 Dr. Kari Lock Morgan SECTION 4.1 Hypothesis test Null and alternative.

Slides:



Advertisements
Similar presentations
Introducing Hypothesis Tests
Advertisements

Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Hypothesis Testing: Intervals and Tests
Inference Sampling distributions Hypothesis testing.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 Randomization distribution p-value.
Hypothesis Testing: Hypotheses
Introduction to Hypothesis Testing AP Statistics Chap 11-1.
Experiments and Observational Studies
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
Section 4.4 Creating Randomization Distributions.
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Significance STAT 250 Dr. Kari Lock Morgan SECTION 4.3 Significance level (4.3) Statistical.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 101 Dr. Kari Lock Morgan 9/25/12 SECTION 4.2 Randomization distribution.
Confidence Intervals and Hypothesis Tests
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Randomized Experiments STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University.
Randomization Tests Dr. Kari Lock Morgan PSU /5/14.
1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests.
More Randomization Distributions, Connections
Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Using Lock5 Statistics: Unlocking the Power of Data
Hypothesis Testing: p-value
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/11/12 Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
The Practice of Statistics Third Edition Chapter 13: Comparing Two Population Parameters Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
CHAPTER 9 Testing a Claim
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Conclusions STAT 250 Dr. Kari Lock Morgan SECTION 4.3 Significance level (4.3) Statistical.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Introduction to Hypothesis Testing
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution (5.1) Central limit theorem.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
+ Unit 6: Comparing Two Populations or Groups Section 10.2 Comparing Two Means.
Today: Hypothesis testing. Example: Am I Cheating? If each of you pick a card from the four, and I make a guess of the card that you picked. What proportion.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 p-value.
Statistics: Unlocking the Power of Data Lock 5 Section 4.1 Introducing Hypothesis Tests.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
+ Homework 9.1:1-8, 21 & 22 Reading Guide 9.2 Section 9.1 Significance Tests: The Basics.
9.3 Hypothesis Tests for Population Proportions
Introducing Hypothesis Tests
Unit 5: Hypothesis Testing
Warm Up Check your understanding p. 541
Measuring Evidence with p-values
Statistical inference: distribution, hypothesis testing
CHAPTER 9 Testing a Claim
Hypothesis Testing: Hypotheses
Week 11 Chapter 17. Testing Hypotheses about Proportions
Measuring Evidence with p-values
Making Data-Based Decisions
CHAPTER 9 Testing a Claim
Boxplots and Quantitative/Categorical Relationships
Introducing Hypothesis Tests
CHAPTER 9 Testing a Claim
Significance Tests: The Basics
Significance Tests: The Basics
Introduction to Hypothesis Testing
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Statistical Test A test of significance is a formal procedure for comparing observed data with a claim (also called a hypothesis) whose truth we want to.
CHAPTER 9 Testing a Claim
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 250 Dr. Kari Lock Morgan SECTION 4.1 Hypothesis test Null and alternative hypotheses Randomization distribution (Section 4.2)

Statistics: Unlocking the Power of Data Lock 5 Question of the Day Does drinking tea boost your immune system?

Statistics: Unlocking the Power of Data Lock 5 Tea and the Immune System L-theanine is an amino acid found in tea Black tea: about 20mg per cup Green tea (standard): varies, as low as 5mg per cup Green tea (shade grown): varies, up to 46mg per cup (Shade grown green tea examples: Gyokuro, Matcha) Gamma delta T cells are important for helping the immune system fend off infection It is thought that L-theanine primes T cells, activating them to a state of readiness and making them better able to respond to future antigens. Does drinking tea actually boost your immunity? Antigens in tea-Beverage Prime Human Vγ2Vδ2 T Cells in vitro and in vivo for Memory and Non- memory Antibacterial Cytokine ResponsesAntigens in tea-Beverage Prime Human Vγ2Vδ2 T Cells in vitro and in vivo for Memory and Non- memory Antibacterial Cytokine Responses, Kamath et.al., Proceedings of the National Academy of Sciences, May 13, 2003.

Statistics: Unlocking the Power of Data Lock 5 Tea and the Immune System Antigens in tea-Beverage Prime Human Vγ2Vδ2 T Cells in vitro and in vivo for Memory and Non- memory Antibacterial Cytokine ResponsesAntigens in tea-Beverage Prime Human Vγ2Vδ2 T Cells in vitro and in vivo for Memory and Non- memory Antibacterial Cytokine Responses, Kamath et.al., Proceedings of the National Academy of Sciences, May 13, Participants were randomized to drink five or six cups of either tea (black) or coffee every day for two weeks (both drinks have caffeine but only tea has L-theanine) After two weeks, blood samples were exposed to an antigen, and production of interferon gamma (immune system response) was measured Explanatory variable: tea or coffee Response variable: measure of interferon gamma

Statistics: Unlocking the Power of Data Lock 5 Tea and the Immune System In study comparing tea and coffee and levels of interferon gamma, if tea drinkers have significantly higher levels of interferon gamma, can we conclude that drinking tea rather than coffee caused an increase in this aspect of the immune response? a) Yes b) No

Statistics: Unlocking the Power of Data Lock 5 Tea and Immune System The explanatory variable is tea or coffee, and the response variable is immune system response measured in amount of interferon gamma produced. How could we visualize this data? a) Bar chart b) Histogram c) Side-by-side boxplots d) Scatterplot

Statistics: Unlocking the Power of Data Lock 5 Tea and Immune System

Statistics: Unlocking the Power of Data Lock 5 The explanatory variable is tea or coffee, and the response variable is immune system response measured in amount of interferon gamma produced. How might we summarize this data? a) Mean b) Proportion c) Difference in means d) Difference in proportions e) Correlation Tea and Immune System

Statistics: Unlocking the Power of Data Lock 5 Tea and Immune System

Statistics: Unlocking the Power of Data Lock 5 Hypothesis Test A hypothesis test uses data from a sample to assess a claim about a population One mean is higher than the other in the sample Is this difference large enough to conclude the difference is real, and holds for the true population parameters?

Statistics: Unlocking the Power of Data Lock 5 Hypotheses Null Hypothesis (H 0 ): Claim that there is no effect or difference. Alternative Hypothesis (H a ): Claim for which we seek evidence. Hypothesis tests are framed formally in terms of two competing hypotheses:

Statistics: Unlocking the Power of Data Lock 5 Tea and Immune Respose Null Hypothesis (H 0 ): No difference between drinking tea and coffee regarding interferon gamma Alternative Hypothesis (H a ): Drinking tea increases interferon gamma production more than drinking coffee No “effect” or no “difference” Claim we seek “evidence” for

Statistics: Unlocking the Power of Data Lock 5 Hypotheses: parameters More formal hypotheses: µ T = true mean interferon gamma response after drinking tea µ C = true mean interferon gamma response after drinking coffee H 0 : µ T = µ C H a : µ T > µ C

Statistics: Unlocking the Power of Data Lock 5 Difference in Hypotheses Note: the following two sets of hypotheses are equivalent, and can be used interchangeably: H 0 : µ 1 = µ 2 H a : µ 1 ≠ µ 2 H 0 : µ 1 – µ 2 = 0 H a : µ 1 – µ 2 ≠ 0

Statistics: Unlocking the Power of Data Lock 5 Alternative Hypothesis If the researchers were simply comparing tea and coffee, with no a priori hypothesis about which would yield a higher immune response, what would the alternative hypothesis be? a) H a : µ T = µ C b) H a : µ T < µ C c) H a : µ T > µ C d) H a : µ T ≠ µ C

Statistics: Unlocking the Power of Data Lock 5 Hypothesis Helpful Hints Hypotheses are always about population parameters, not sample statistics The null hypothesis always contains an equality The alternative hypothesis always contains an inequality (, ≠) The type of inequality in the alternative comes from the wording of the question of interest

Statistics: Unlocking the Power of Data Lock 5 Statistical Hypotheses Null Hypothesis Alternative Hypothesis ALL POSSIBILITIES Can we reject the null hypothesis? Usually the null is a very specific statement ?

Statistics: Unlocking the Power of Data Lock 5 Two Plausible Explanations If the sample data support the alternative, there are two plausible explanations: 1. The alternative hypothesis (H a ) is true 2. The null hypothesis (H 0 ) is true, and the sample results were just due to random chance Key question: Do the data provide enough evidence to rule out #2?

Statistics: Unlocking the Power of Data Lock 5 Two Plausible Explanations Why might the tea drinkers have higher levels of interferon gamma? Two plausible explanations:  Alternative true: Tea causes increase in interferon gamma production  Null true, random chance: the people who got randomly assigned to the tea group have better immune systems than those who got randomly assigned to the coffee group

Statistics: Unlocking the Power of Data Lock 5 The Plausibility of the Null The goal is determine whether the null hypothesis and random chance are a plausible explanation, given the observed data Key idea: How unlikely would it be to see a sample statistic as extreme as we’ve observed, just by random chance, if the null hypothesis were true? How do we figure this out? SIMULATE what would happen if H 0 were true!

Statistics: Unlocking the Power of Data Lock 5 Tea and Immune Response RRRRR RRRRR RRRRR RRRRR Tea Coffee 1. Randomize units to treatment groups RRR RRRRR RRR RRRRR R R

Statistics: Unlocking the Power of Data Lock 5 Tea and Immune Response RRR RRRRR RRR RRRRR RRRR R Tea Coffee 1.Randomize units to treatment groups 2.Conduct experiment 3.Measure response variable

Statistics: Unlocking the Power of Data Lock 5 Tea and Immune Response RRR RRRRR RRR RRRRR RRRR R Tea Coffee 1.Randomize units to treatment groups 2.Conduct experiment 3.Measure response variable 4.Calculate statistic

Statistics: Unlocking the Power of Data Lock 5 Tea and Immune Response RRR RRRRR RRR RRRRR RRRR 1.Randomize units to treatment groups 2.Conduct experiment 3.Measure response variable 4.Calculate statistic 5.Simulate statistics we could get, just by random chance, if the null hypothesis were true R Tea Coffee

Statistics: Unlocking the Power of Data Lock 5 To see if a statistic provides evidence against H 0, we need to see what kind of sample statistics we would observe, just by random chance, if H 0 were true Measuring Evidence against H 0

Statistics: Unlocking the Power of Data Lock 5 “by random chance” means the random assignment to the two treatment groups “if H 0 were true” means that interferon gamma levels would be the same, regardless of whether you drink tea or coffee To simulate what would happen just by random chance, if H 0 were true… Re-randomize units to treatment groups, keeping the response values unchanged Simulation

Statistics: Unlocking the Power of Data Lock 5 Tea and Immune Response RRR RRRRR RRR RRRRR RRRR R Tea Coffee RRR RRRRR RR R 58 RRR RRRRR RR

Statistics: Unlocking the Power of Data Lock 5 Simulation RRR RRRRR RRR RRRRR R Tea Coffee RRR RRRRR RR R 58 R RRRRR Re-randomize units to treatment groups

Statistics: Unlocking the Power of Data Lock 5 Simulation RRR RRRRR RRR RRRRR R Tea Coffee 1. Re-randomize units to treatment groups 03 RR Calculate statistic: Repeat Many Times!

Statistics: Unlocking the Power of Data Lock 5 Distribution of Statistic Under H 0 How extreme is the observed statistic??? Is the null hypothesis a plausible explanation?

Statistics: Unlocking the Power of Data Lock 5 Randomization Distribution A randomization distribution is a collection of statistics from samples simulated assuming the null hypothesis is true The randomization distribution shows what types of statistics would be observed, just by random chance, if the null hypothesis were true

Statistics: Unlocking the Power of Data Lock 5 Green Tea and Prostate Cancer A study was conducted on 60 men with PIN lesions, some of which turn into prostate cancer Half of these men were randomized to take 600 mg of green tea extract daily, while the other half were given a placebo pill The study was double-blind, neither the participants nor the doctors knew who was actually receiving green tea After one year, only 1 person taking green tea had gotten cancer, while 9 taking the placebo had gotten cancer

Statistics: Unlocking the Power of Data Lock 5 Green Tea and Prostate Cancer In the study about green tea and prostate cancer, if the difference is statistically significant, could we conclude that green tea really does help prevent prostate cancer? (a) Yes (b) No

Statistics: Unlocking the Power of Data Lock 5 The explanatory variable is green tea extract of placebo, the response variable is whether or not the person developed prostate cancer. What statistic and parameter is most relevant? a) Mean b) Proportion c) Difference in means d) Difference in proportions e) Correlation Green Tea and Prostate Cancer

Statistics: Unlocking the Power of Data Lock 5 Green Tea and Prostate Cancer p 1 = proportion of green tea consumers to get prostate cancer p 2 = proportion of placebo consumers to get prostate cancer State the null hypotheses. a) H 0 : p 1 = p 2 b) H 0 : p 1 < p 2 c) H 0 : p 1 > p 2 d) H 0 : p 1 ≠ p 2

Statistics: Unlocking the Power of Data Lock 5 Green Tea and Prostate Cancer p 1 = proportion of green tea consumers to get prostate cancer p 2 = proportion of placebo consumers to get prostate cancer State the alternative hypotheses. a) H 0 : p 1 = p 2 b) H 0 : p 1 < p 2 c) H 0 : p 1 > p 2 d) H 0 : p 1 ≠ p 2

Statistics: Unlocking the Power of Data Lock 5 Randomization Test 1. State hypotheses 2. Collect data 3. Calculate statistic: 4. Simulate statistics that could be observed, just by random chance, if the null hypothesis were true (create a randomization distribution) 5. How extreme is the observed statistic? 6. Is the null hypothesis (random chance) a plausible explanation?

Statistics: Unlocking the Power of Data Lock 5 Randomization Distribution Based on the randomization distribution, would the observed statistic be extreme if the null hypothesis were true? a) Yes b) No

Statistics: Unlocking the Power of Data Lock 5 Randomization Distribution Do you think the null hypothesis is a plausible explanation for these results? a) Yes b) No

Statistics: Unlocking the Power of Data Lock 5 Randomization Distribution In a hypothesis test for H 0 :  = 12 vs H a :  < 12, we have a sample with n = 45 and. What do we require about the method to produce randomization samples?

Statistics: Unlocking the Power of Data Lock 5 Randomization Distribution In a hypothesis test for H 0 :  = 12 vs H a :  < 12, we have a sample with n = 45 and. Where will the randomization distribution be centered? a) 10.2 b) 12 c) 45 d) 1.8

Statistics: Unlocking the Power of Data Lock 5 Randomization Distribution Center A randomization distribution is centered at the value of the parameter given in the null hypothesis. A randomization distribution simulates samples assuming the null hypothesis is true, so

Statistics: Unlocking the Power of Data Lock 5 Randomization Distribution In a hypothesis test for H 0 :  = 12 vs H a :  < 12, we have a sample with n = 45 and. What will we look for on the randomization distribution? a) How extreme 10.2 is b) How extreme 12 is c) How extreme 45 is d) What the standard error is e) How many randomization samples we collected

Statistics: Unlocking the Power of Data Lock 5 Randomization Distribution In a hypothesis test for H 0 :  1 =  2 vs H a :  1 >  2, we have a sample with 26 and 21. What do we require about the method to produce randomization samples?

Statistics: Unlocking the Power of Data Lock 5 Randomization Distribution In a hypothesis test for H 0 :  1 =  2 vs H a :  1 >  2, we have a sample with 26 and 21. Where will the randomization distribution be centered? a) 0 b) 1 c) 21 d) 26 e) 5

Statistics: Unlocking the Power of Data Lock 5 Randomization Distribution In a hypothesis test for H 0 :  1 =  2 vs H a :  1 >  2, we have a sample with 26 and 21. What do we look for on the randomization distribution? a) The standard error b) The center point c) How extreme 26 is d) How extreme 21 is e) How extreme 5 is

Statistics: Unlocking the Power of Data Lock 5 Summary Hypothesis tests use data from a sample to assess a claim about a population Hypothesis tests are usually formalized with competing hypotheses:  Null hypothesis (H 0 ): no effect or no difference  Alternative hypothesis (H a ): what we seek evidence for We assess whether the null hypothesis is plausible by: 1. seeing what kinds of statistics we would observe by random chance, if the null hypothesis were true 2. assessing the extremity of our observed statistic

Statistics: Unlocking the Power of Data Lock 5 To Do Read Section 4.1 HW 4.1 due Friday, 10/16

Statistics: Unlocking the Power of Data Lock 5 Null Hypothesis