Stat 512 – Lecture 13 Chi-Square Analysis (Ch. 8).

Slides:



Advertisements
Similar presentations
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Advertisements

CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Chapter 26: Comparing Counts
Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.
CHAPTER 11 Inference for Distributions of Categorical Data
Stat 512 – Lecture 14 Analysis of Variance (Ch. 12)
Stat 301- Day 32 More on two-sample t- procedures.
Stat 512 – Lecture 12 Two sample comparisons (Ch. 7) Experiments revisited.
Chapter Goals After completing this chapter, you should be able to:
Statistics 303 Chapter 9 Two-Way Tables. Relationships Between Two Categorical Variables Relationships between two categorical variables –Depending on.
Stat 217 – Day 27 Chi-square tests (Topic 25). The Plan Exam 2 returned at end of class today  Mean.80 (36/45)  Solutions with commentary online  Discuss.
Stat 217 – Week 10. Outline Exam 2 Lab 7 Questions on Chi-square, ANOVA, Regression  HW 7  Lab 8 Notes for Thursday’s lab Notes for final exam Notes.
Stat 301 – Day 21 Large sample methods. Announcements HW 4  Updated solutions Especially Simpson’s Paradox  Should always show your work and explain.
Stat 512 – Day 8 Tests of Significance (Ch. 6). Last Time Use random sampling to eliminate sampling errors Use caution to reduce nonsampling errors Use.
Stat 217 – Day 20 Comparing Two Proportions The judge asked the statistician if she promised to tell the truth, the whole truth, and nothing but the truth?
5-3 Inference on the Means of Two Populations, Variances Unknown
CHAPTER 19: Two-Sample Problems
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
How Can We Test whether Categorical Variables are Independent?
Lesson Inference for Two-Way Tables. Vocabulary Statistical Inference – provides methods for drawing conclusions about a population parameter from.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
More About Significance Tests
Comparing Two Population Means
Comparing Two Proportions
1 Desipramine is an antidepressant affecting the brain chemicals that may become unbalanced and cause depression. It was tested for recovery from cocaine.
©2011 Brooks/Cole, Cengage Learning Elementary Statistics: Looking at the Big Picture 1 Lecture 33: Chapter 12, Section 2 Two Categorical Variables More.
Chapter 22: Comparing Two Proportions
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
CHAPTER 11 SECTION 2 Inference for Relationships.
Chapter 22: Comparing Two Proportions. Yet Another Standard Deviation (YASD) Standard deviation of the sampling distribution The variance of the sum or.
Copyright © 2010 Pearson Education, Inc. Slide Beware: Lots of hidden slides!
Copyright © 2010 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
Essential Statistics Chapter 161 Review Part III_A_Chi Z-procedure Vs t-procedure.
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
CHAPTER 27: One-Way Analysis of Variance: Comparing Several Means
Lesson Inference for Two-Way Tables. Knowledge Objectives Explain what is mean by a two-way table. Define the chi-square (χ 2 ) statistic. Identify.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
 Check the Random, Large Sample Size and Independent conditions before performing a chi-square test  Use a chi-square test for homogeneity to determine.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Chapter 11 Chi-Square Tests.
Simulation-Based Approach for Comparing Two Means
Chapter 10 Analyzing the Association Between Categorical Variables
Chapter 11 Chi-Square Tests.
Chapter 11 Chi-Square Tests.
Presentation transcript:

Stat 512 – Lecture 13 Chi-Square Analysis (Ch. 8)

Comparing Proportions Decrease in population proportion rating paper as largely believable? H 0 :  98 -  02 = 0 H a :  98 -  02 > 0 z = 1.64, p-value=.051 Weak evidence of a decrease (0-.07) in the populations proportion No cause and effect Increase in survival rate with letrozole? H 0 : treatment effect = 0 H a : treatment effect (l-p) > 0 z = 7.14, p-value<.001 Very strong evidence of an increase in survival rate ( ) due to letrozole At least for these volunteers

Last Time: Comparing Proportions If have independent random samples or a randomized experiment with large sample sizes (at least 5 successes and 5 failures in each group), then can use 2-sample z-procedures (2 proportions)  If an experiment with small group sizes, use two-way table simulation as before Keep in mind  Parameter is the “difference” in population proportions or true treatment effect  Confidence interval is for the difference in population proportions/true treatment effect

Practice Problem Compare proportion of all men voting for AS to proportion of all women voting for AS Descriptive: The conditional proportion voting for Arnold is higher for the men (.49) than for the women (.43) in this sample.

Practice Problem Inference:  m vs.  f  H 0 :  m –  f = 0 (no difference in the population proportions)  Ha:  m –  f > 0 (male population would say they voted for AS at a higher rate)  The sample sizes are large (at least 5 voting and not voting for Arnie in each sample) and we trust CNN to have collected representative samples. We are also willing to treat the samples of men and women as independent.

Practice Problem Inference:  m vs.  f  Using the applet, z = 3.91 and p-value <.0001  With such a small p-value, we reject the null hypothesis of equal population proportions. We have strong evidence that males are more likely to say they voted for Arnie. We don’t know why but assuming CNN did their job right, we will generalize this difference to the population of voters.  We are 90% confident that a higher proportion of CA voting males than females would say they voted for Arnold by 3.5 to 8.5 percentage points.

Next Step Comparing two population means/treatment effect with a quantitative response variable Example 3:  Observational units = volunteers in Shigella vaccination trials  Treat as samples from larger population of healthy adults

Example 3: Body Temperatures Minitab commands depend on which format typed data in Descriptive Analysis  Samples show slight tendency for higher body temperatures among women (mean = vs F) but similar variability and shape

Example 3: Body Temperatures Perhaps the population means are equal, and these sample means differ just based on random sampling variability H 0 :      H a :     ≠  (“differs”) Technical conditions  Normal populations (works ok if n 1, n 2 >20)  Large populations (N > 20n in each case)  Independent random samples

Test statistic Example 3: Body Temperatures Result is statistically significant at 5% level but not 1% level. Moderate evidence that these sample means are further apart than we would expect from random sampling variability alone if the population means were equal. Conclude that the mean body temperature differs by.039 o F to.54 o F.

Example 4: Sleep Deprivation Case 2: Randomized Experiment When samples sizes are large or each group distribution is normal, the randomization distribution is well approximated by the t distribution  Pooled t test?

Example 4: Sleep Deprivation Case 2: Randomized Experiment Validity?

Example 4 Conclusions 1. Statistically significant 2. Cause and effect conclusion valid 3. Generalizing to larger population? Is it possible that we are making the wrong decision?  Yes, type I error…

Summary Type of study  Do you have (independent) random samples from two populations? OR Do you have a randomized experiment?  Same calculations, different conclusions Are the sample sizes large for you to use normal/t procedures?  With small sample sizes, use Fisher’s Exact Test (two-way table simulation) or randomization tests from before  With larger samples, get test statistic and confidence interval conveniently

Example 1: Dr. Spock’s Trial Proportion of women on jury for each judge Let  i = probability a women each selected for judge i’s jury selection process

Example 1: Dr. Spock’s Trial What does it mean to say there is no “judge effect” or difference across the judges?

Example 1: Dr. Spock’s Trial H 0 :               Big change?  Now trying to compare more than two populations  Would it be reasonable to analysis all of the two- sample comparisons?  Probability of making at least one type I error increases as we increase the number of tests  Would prefer one procedure, one type I error

Example 1: Dr. Spock’s Trial How do we determine the “expected results” when the null hypothesis is true? Apply the common rate to each Judge… How measure the discrepancy between the observed counts and the expected counts?

Chi-Squared Statistic New test statistic: But doesn’t follow a normal distribution! Chi-square distribution Skewed to the right Characterized by “degrees of freedom” Observed  2 =62.7

Using Minitab Enter two-way table Select Stat > Tables > Chi-Square Test (Table in Worksheet) Output provides observed counts, expected counts, test statistic value, degrees of freedom, p-value

Minitab output Strongly reject H 0, conclude that at least one of the judges has a different long-run probability of selecting a female (assuming these cases are representative of the overall performance for each judge)

Follow-up Analysis If find a statistically significant difference, might want to say more about which population(s) appear to differ. Look at the terms that are being added together to get the chi-square sum Observed fewer women than expected Observed more men than expected women men

Example 2: Near-sightedness What would this bar graph look like if there was no association between lighting condition and eye sight? Not that the proportion with each eye condition is the same but that the distribution of eye condition is the same for each lighting groups

Example 2: Near-sightedness H 0 : Eye condition and Lighting are statistically independent (i.e., the two variables are not associated) H a : Not statistically independent (the two variables are associated)

Example 2: Near sightedness Expected counts  Proportion with hyperopia =.190  So of the 172 children in darkness, 19% with hyperopia =  For the 232 children with night light, 19% with hyperopia =  For the 75 children with room light, 19% with hyperopia = 14.25

In general Expected counts = row total × column total table total Goal, same distribution across all explanatory variable groups To measure the discrepancy between observed and expected counts, can again use chi-squared test statistic

Example 2: Near sightedness  Small p-value provides strong evidence of a real association between eye condition and lighting  Observational so no causation  Even a little worried about generalizing beyond this particular clinic All expected counts exceed 5 (smallest = 14.25) Assuming random sample of children…

Summary – Chi-Square Procedures Chi-square tests arise in several situations 1. Comparing 2 or more population proportions H 0 :       H a : at least one  i differs 2. Comparing 2 or more population distributions on categorical response variable H 0 : the population distributions are the same H a : the population distributions are not all the same

Summary – Chi-Square Procedures 3. Association between 2 categorical variables H o : no association between var 1 and var 2 (independent) H a : is an association between the variables Technical conditions: Random Case 1 and 2: Independent random samples from each population or randomized experiment Case 3: Random sample from population of interest Large sample(s) All expected cell counts >5

For Tuesday Start reading Ch. 12 Submit PP 11 in Blackboard HW 6 covers two-sample comparisons and chi-square procedures  Remember to include all relevant computer output