Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stat 512 – Lecture 13 Chi-Square Analysis (Ch. 8).

Similar presentations


Presentation on theme: "Stat 512 – Lecture 13 Chi-Square Analysis (Ch. 8)."— Presentation transcript:

1 Stat 512 – Lecture 13 Chi-Square Analysis (Ch. 8)

2 Comparing Proportions Decrease in population proportion rating paper as largely believable? H 0 :  98 -  02 = 0 H a :  98 -  02 > 0 z = 1.64, p-value=.051 Weak evidence of a decrease (0-.07) in the populations proportion No cause and effect Increase in survival rate with letrozole? H 0 : treatment effect = 0 H a : treatment effect (l-p) > 0 z = 7.14, p-value<.001 Very strong evidence of an increase in survival rate (.044-.077) due to letrozole At least for these volunteers

3 Last Time: Comparing Proportions If have independent random samples or a randomized experiment with large sample sizes (at least 5 successes and 5 failures in each group), then can use 2-sample z-procedures (2 proportions)  If an experiment with small group sizes, use two-way table simulation as before Keep in mind  Parameter is the “difference” in population proportions or true treatment effect  Confidence interval is for the difference in population proportions/true treatment effect

4 Practice Problem Compare proportion of all men voting for AS to proportion of all women voting for AS Descriptive: The conditional proportion voting for Arnold is higher for the men (.49) than for the women (.43) in this sample.

5 Practice Problem Inference:  m vs.  f  H 0 :  m –  f = 0 (no difference in the population proportions)  Ha:  m –  f > 0 (male population would say they voted for AS at a higher rate)  The sample sizes are large (at least 5 voting and not voting for Arnie in each sample) and we trust CNN to have collected representative samples. We are also willing to treat the samples of men and women as independent.

6 Practice Problem Inference:  m vs.  f  Using the applet, z = 3.91 and p-value <.0001  With such a small p-value, we reject the null hypothesis of equal population proportions. We have strong evidence that males are more likely to say they voted for Arnie. We don’t know why but assuming CNN did their job right, we will generalize this difference to the population of voters.  We are 90% confident that a higher proportion of CA voting males than females would say they voted for Arnold by 3.5 to 8.5 percentage points.

7 Next Step Comparing two population means/treatment effect with a quantitative response variable Example 3:  Observational units = volunteers in Shigella vaccination trials  Treat as samples from larger population of healthy adults

8 Example 3: Body Temperatures Minitab commands depend on which format typed data in Descriptive Analysis  Samples show slight tendency for higher body temperatures among women (mean = 98.4 0 vs. 98.1 0 F) but similar variability and shape

9 Example 3: Body Temperatures Perhaps the population means are equal, and these sample means differ just based on random sampling variability H 0 :      H a :     ≠  (“differs”) Technical conditions  Normal populations (works ok if n 1, n 2 >20)  Large populations (N > 20n in each case)  Independent random samples

10 Test statistic Example 3: Body Temperatures Result is statistically significant at 5% level but not 1% level. Moderate evidence that these sample means are further apart than we would expect from random sampling variability alone if the population means were equal. Conclude that the mean body temperature differs by.039 o F to.54 o F.

11 Example 4: Sleep Deprivation Case 2: Randomized Experiment When samples sizes are large or each group distribution is normal, the randomization distribution is well approximated by the t distribution  Pooled t test?

12 Example 4: Sleep Deprivation Case 2: Randomized Experiment Validity?

13 Example 4 Conclusions 1. Statistically significant 2. Cause and effect conclusion valid 3. Generalizing to larger population? Is it possible that we are making the wrong decision?  Yes, type I error…

14 Summary Type of study  Do you have (independent) random samples from two populations? OR Do you have a randomized experiment?  Same calculations, different conclusions Are the sample sizes large for you to use normal/t procedures?  With small sample sizes, use Fisher’s Exact Test (two-way table simulation) or randomization tests from before  With larger samples, get test statistic and confidence interval conveniently

15 Example 1: Dr. Spock’s Trial Proportion of women on jury for each judge Let  i = probability a women each selected for judge i’s jury selection process

16 Example 1: Dr. Spock’s Trial What does it mean to say there is no “judge effect” or difference across the judges?

17 Example 1: Dr. Spock’s Trial H 0 :               Big change?  Now trying to compare more than two populations  Would it be reasonable to analysis all of the two- sample comparisons?  Probability of making at least one type I error increases as we increase the number of tests  Would prefer one procedure, one type I error

18 Example 1: Dr. Spock’s Trial How do we determine the “expected results” when the null hypothesis is true? Apply the common rate to each Judge… How measure the discrepancy between the observed counts and the expected counts?

19 Chi-Squared Statistic New test statistic: But doesn’t follow a normal distribution! Chi-square distribution Skewed to the right Characterized by “degrees of freedom” Observed  2 =62.7

20 Using Minitab Enter two-way table Select Stat > Tables > Chi-Square Test (Table in Worksheet) Output provides observed counts, expected counts, test statistic value, degrees of freedom, p-value

21 Minitab output Strongly reject H 0, conclude that at least one of the judges has a different long-run probability of selecting a female (assuming these cases are representative of the overall performance for each judge)

22 Follow-up Analysis If find a statistically significant difference, might want to say more about which population(s) appear to differ. Look at the terms that are being added together to get the chi-square sum Observed fewer women than expected Observed more men than expected women men

23 Example 2: Near-sightedness What would this bar graph look like if there was no association between lighting condition and eye sight? Not that the proportion with each eye condition is the same but that the distribution of eye condition is the same for each lighting groups

24 Example 2: Near-sightedness H 0 : Eye condition and Lighting are statistically independent (i.e., the two variables are not associated) H a : Not statistically independent (the two variables are associated)

25 Example 2: Near sightedness Expected counts  Proportion with hyperopia =.190  So of the 172 children in darkness, 19% with hyperopia = 32.68  For the 232 children with night light, 19% with hyperopia = 44.08  For the 75 children with room light, 19% with hyperopia = 14.25

26 In general Expected counts = row total × column total table total Goal, same distribution across all explanatory variable groups To measure the discrepancy between observed and expected counts, can again use chi-squared test statistic

27 Example 2: Near sightedness  Small p-value provides strong evidence of a real association between eye condition and lighting  Observational so no causation  Even a little worried about generalizing beyond this particular clinic All expected counts exceed 5 (smallest = 14.25) Assuming random sample of children…

28 Summary – Chi-Square Procedures Chi-square tests arise in several situations 1. Comparing 2 or more population proportions H 0 :       H a : at least one  i differs 2. Comparing 2 or more population distributions on categorical response variable H 0 : the population distributions are the same H a : the population distributions are not all the same

29 Summary – Chi-Square Procedures 3. Association between 2 categorical variables H o : no association between var 1 and var 2 (independent) H a : is an association between the variables Technical conditions: Random Case 1 and 2: Independent random samples from each population or randomized experiment Case 3: Random sample from population of interest Large sample(s) All expected cell counts >5

30 For Tuesday Start reading Ch. 12 Submit PP 11 in Blackboard HW 6 covers two-sample comparisons and chi-square procedures  Remember to include all relevant computer output


Download ppt "Stat 512 – Lecture 13 Chi-Square Analysis (Ch. 8)."

Similar presentations


Ads by Google