## Presentation on theme: "Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1."— Presentation transcript:

Copyright ©2011 Brooks/Cole, Cengage Learning 2 Principle Question: Is there a relationship between the two variables, so that the category into which individuals fall for one variable seems to depend on the category they are in for the other variable?

Copyright ©2011 Brooks/Cole, Cengage Learning 3 15.1 Chi-Square Test for Two-Way Tables Data displayed in a contingency or two-way table. Each combination of row/column is a cell of table. Two types of conditional percents: row and column. Row percents: percents across a row, based on total number in the row. Column percents: percents down a column, based on total number in the column. If one variable is explanatory, use it to define rows and use row percents.

Copyright ©2011 Brooks/Cole, Cengage Learning 4 Recall: Five steps for assessing statistical significance. Step 1: Null and alternative hypotheses H 0 : The two variables are not related. H a : The two variables are related. Sometimes associated is used instead of related.

Copyright ©2011 Brooks/Cole, Cengage Learning 5 Example 15.1 Ear Infections and Xylitol Experiment: n = 533 children randomized to 3 groups Group 1: Placebo Gum; Group 2: Xylitol Gum; Group 3: Xylitol Lozenge Response = Did child have an ear infection? Only 16.2% of children in Xylitol Gum group had infection.

Copyright ©2011 Brooks/Cole, Cengage Learning 6 Example 15.1 Infections and Xylitol H 0 : p 1 = p 2 = p 3  (no relationship between trt and outcome) H a : p 1, p 2, p 3 are not all the same (there is a relationship) Let p 1 = proportion who would get an ear infection in the population given placebo gum p 2 = proportion who would get an ear infection in the population given xylitol gum p 3 = proportion who would get an ear infection in the population given xylitol lozenges

Copyright ©2011 Brooks/Cole, Cengage Learning 7 Example 15.2 Making Friends Q: With whom do you find it easiest to make friend – opposite sex or same sex or no difference? H 0 : No difference in distribution of responses of men and women (no relationship between gender and response) H a : There is a difference in distribution of responses of men and women (is a relationship between gender and response)

Copyright ©2011 Brooks/Cole, Cengage Learning 8 Tech Note: Homogeneity and Independence Two variations of the general hypotheses statements which depend on the method of sampling. If samples have been taken from separate populations, the null hypothesis statement is a statement of homogeneity (sameness) among the populations. If a sample has been taken from a single population, and two categorical variables measured for each individual, the statement of no relationship is a statement of independence between the two variables.

Copyright ©2011 Brooks/Cole, Cengage Learning 9 Step 2: Chi-square Statistic and Necessary Conditions Compute expected count for each cell: Expected count = Row total  Column total Total n Compute test statistic by totaling over all cells: (Observed – Expected) 2 Expected Chi-square statistic measures the difference between the observed counts and the counts that would be expected if there were no relationship (i.e. if null hypothesis were true).

Copyright ©2011 Brooks/Cole, Cengage Learning 10 More on the Chi-square Statistic Large difference  evidence of a relationship. Guidelines for large sample: 1. All expected counts should be greater than 1. 2. At least 80% of the cells should have an expected count greater than 5.

Copyright ©2011 Brooks/Cole, Cengage Learning 11 Example 15.3 Infections and Xylitol Expected count for “Placebo Gum, Yes Infection” cell: Expected Counts:

Copyright ©2011 Brooks/Cole, Cengage Learning 12 Example 15.3 Infections and Xylitol Chi-square Test Statistic:

Copyright ©2011 Brooks/Cole, Cengage Learning 13 Step 3: p-value of Chi-square Test p-value = probability the chi-square test statistic could have been as large or larger if the null hypothesis were true. Large test statistic  evidence of a relationship. So how large is enough to declare significance? Chi-square probability distribution used to find p-value. Degrees of freedom df = (Rows – 1)(Columns – 1) = (r – 1)(c – 1)

Copyright ©2011 Brooks/Cole, Cengage Learning 14 Chi-square Distributions Skewed to the right distributions. Minimum value is 0. Indexed by the degrees of freedom (df).

Copyright ©2011 Brooks/Cole, Cengage Learning 15 Example 15.4 Infections and Xylitol Chi-square statistic was 6.69 df = (3-1)(2-1) = 2 p-value = 0.035

Copyright ©2011 Brooks/Cole, Cengage Learning 16 Finding the p-value from Table A.5: If value of statistic falls between two table entries, p-value is between values of p (column headings) for these entries. If value of statistic is larger than entry in rightmost column (labeled p = 0.001), p-value is less than 0.001 (p < 0.001). If value of statistic is smaller than entry in leftmost column (labeled p = 0.50), p-value is greater than 0.50 (p > 0.50). Look in corresponding “df” row of Table A.5. Scan across until you find where the statistic falls.

Copyright ©2011 Brooks/Cole, Cengage Learning 17 Example 15.5 Infections and Xylitol There is a statistically significant relationship between the risk of an ear infection and the preventative treatment. Chi-square statistic was 6.69 df = (3-1)(2-1) = 2.025 < p-value <.05

Copyright ©2011 Brooks/Cole, Cengage Learning 18 Example 15.6 A Moderate p-Value Table has three rows and three columns. The computed chi-square statistic is 8.12. Degrees of freedom are df = (3 – 1)(3 – 1) = 4. Finding the p-value: Scan the df = 4 row in Table A.5 and the value of 8.12 is between the entries 7.78 (p = 0.10) and 8.50 (p = 0.075). Thus, the p-value is between 0.075 and 0.10. 0.075 < p-value < 0.10

Copyright ©2011 Brooks/Cole, Cengage Learning 19 Steps 4 and 5:Making a Decision and Reporting a Conclusion Two equivalent rules: Reject H 0 when … p-value  0.05 Chi-square statistic is greater than the entry in the 0.05 column of Table A.5 (the critical value). Large test statistic  small p-value  evidence a real relationship exists in population. Note: For 2x2 tables, a test statistic of 3.84 or larger is significant.

Copyright ©2011 Brooks/Cole, Cengage Learning 20 Reporting a Conclusion Ways to write “do not reject H 0 ” The relationship between smoking and drinking alcohol is not statistically significant. The proportions of smokers who never drink, drink occasionally, and drink often are not significantly different from the proportions of non-smokers who do so. There is insufficient evidence to conclude that there is a relationship in the population between smoking and drinking alcohol. Example: Testing whether there is a relationship between smoking (yes or no) and drinking alcohol (never, occasionally, often).

Copyright ©2011 Brooks/Cole, Cengage Learning 21 Reporting a Conclusion Ways to write “reject H 0 ” There is a statistically significant relationship between smoking and drinking alcohol. The proportions of smokers who never drink, drink occasionally, and drink often are not the same as the proportions of non-smokers who do so. Smokers have significantly different drinking behavior than non-smokers. Example: Testing whether there is a relationship between smoking (yes or no) and drinking alcohol (never, occasionally, often).

Copyright ©2011 Brooks/Cole, Cengage Learning 22 Example 15.8 Making Friends Q: With whom do you find it easiest to make friend – opposite sex or same sex or no difference? df = (2 – 1)(3 – 1) = 2. Table A.5: value of 8.515 falls between entries in 0.025 column (7.38) and 0.01 column (9.21). 0.01 < p-value < 0.025 There is statistically significant relationship at the 0.05 level. There appears to be a a difference in distribution of responses of men and women if the populations were asked this question.

Copyright ©2011 Brooks/Cole, Cengage Learning 23 Supporting Analyses Description of row (or column) percents. Bar chart of counts or percents. Examination each cell’s “contribution to chi-square.” Cells with largest values have contributed most to significance of relationship  deserve attention in any description of relationship. Confidence intervals for important proportions or for differences between proportions. To learn about the specific nature of the relationship:

Copyright ©2011 Brooks/Cole, Cengage Learning 24 Chi-Square Test or Z-Test for Difference in Two Proportions? Does it make a difference? If desired H a has no specific direction (two-sided), the two tests give exactly the same p-value. The squared value of the z-statistic equals the chi-square statistic. If desired H a has a direction (one-sided), the z-test should be used.

Copyright ©2011 Brooks/Cole, Cengage Learning 25 15.3 Testing Hypotheses about One Categorical Variable: GOF Step 1: Determine the null and alternative hypotheses. H 0 : The probabilities for k categories are p 1, p 2,..., p k. H a : Not all probabilities specified in H 0 are correct. Note: Probabilities in the null hypothesis must sum to 1. Goodness of Fit (GOF) Test

Copyright ©2011 Brooks/Cole, Cengage Learning 26 Goodness of Fit (GOF) Test Step 2: Verify necessary data conditions, and if met, summarize the data into an appropriate test statistic. If at least 80% of the expected counts are greater than 5 and none are less than 1, compute where the expected count for the i th category is computed as np i. (Observed – Expected) 2 Expected

Copyright ©2011 Brooks/Cole, Cengage Learning 27 Goodness of Fit (GOF) Test Step 3: Assuming the null hypothesis is true, find the p-value. Use chi-square distribution with df = k – 1. Step 4: Decide whether or not the result is statistically significant based on the p-value. The result is statistically significant if the p-value  . Step 5: Report the conclusion in the context of the situation.

Copyright ©2011 Brooks/Cole, Cengage Learning 28 Example 15.15 Pennsylvania Daily Number State lottery game: Three-digit number made by drawing a digit between 0 and 9 from each of three different containers. Focus = draws from the first container. If numbers randomly selected, each value would be equally likely to occur. H 0 : p = 1/10 for each of the 10 possible digits H a : Not H 0

Copyright ©2011 Brooks/Cole, Cengage Learning 29 Example 15.15 Daily Number Data: n = 500 days between 7/19/99 and 11/29/00

Copyright ©2011 Brooks/Cole, Cengage Learning 30 Example 15.15 Daily Number Chi-square goodness of fit statistic: From Table A.5: df = k – 1 = 10 – 1 = 9 p-value > 0.50 Result is not statistically significant; the null hypothesis is not rejected.