## Presentation on theme: "Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15."— Presentation transcript:

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 2 Principal Question: Is there a relationship between the two variables, so that the category into which individuals fall for one variable seems to depend on the category they are in for the other variable?

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 3 Recall: Data displayed in a contingency or two-way table. Each combination of row/column is a cell of table. Two types of conditional percents: row and column. Row percents: percents across a row, based on total number in the row. Column percents: percents down a column, based on total number in the column. If one variable is explanatory, use it to define rows and use row percents.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 4 15.1 Chi-Square Test for Two-Way Tables Recall there are five steps for assessing statistical significance. Step 1: Determine null and alternative hypotheses H 0 : The two variables are not related. H a : The two variables are related. Sometimes associated is used instead of related.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 5 Example 15.1 Ear Infections and Xylitol Experiment: n = 533 children randomized to 3 groups Group 1: Placebo Gum; Group 2: Xylitol Gum; Group 3: Xylitol Lozenge Response = Did child have an ear infection? Only 16.2% of children in Xylitol Gum group had infection.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 6 Example 15.1 Infections and Xylitol (cont) H 0 : p 1 = p 2 = p 3  (no relationship between trt and outcome) H a : p 1, p 2, p 3 are not all the same (there is a relationship) Let p 1 = proportion who would get an ear infection in a population given placebo gum p 2 = proportion who would get an ear infection in a population given xylitol gum p 3 = proportion who would get an ear infection in a population given xylitol lozenges

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 7 Example 15.2 Making Friends Q: With whom do you find it easiest to make friend – opposite sex or same sex or no difference? H 0 : No difference in distribution of responses of men and women (no relationship between gender and response) H a : There is a difference in distribution of responses of men and women (is a relationship between gender and response)

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 8 Tech Note: Homogeneity and Independence Two variations of the general hypothesis statements which depend on the method of sampling. If samples have been taken from separate populations, the null hypothesis statement is a statement of homogeneity (sameness) among the populations. If a sample has been taken from a single population, and two categorical variables measured for each individual, the statement of no relationship is a statement of independence between the two variables.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 9 Guidelines for large sample: 1. All expected counts should be greater than 1. 2. At least 80% of the cells should have an expected count greater than 5. Step 2: Chi-square Statistic and Necessary Conditions Compute expected count for each cell: Expected count = (Row total)  (Column total) Total n Compute test statistic by totaling over all cells: (Observed – Expected) 2 Expected

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 10 More on the Chi-square Statistic Chi-square statistic measures the difference between the observed counts and the counts that would be expected if there were no relationship (i.e. if the null hypothesis were true). Large difference => evidence of a relationship. Chi-square probability distribution used to find p-value. Degrees of freedom df = (Rows – 1)(Columns – 1).

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 11 Example 15.1 Infections and Xylitol (cont) Output for testing significance of the relationship: p-value = 0.035 which is < 0.05 There is a statistically significant relationship between the risk of an ear infection and the preventative treatment.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 12 Example 15.1 Infections and Xylitol (cont) Expected count for “Placebo Gum, Yes Infection” cell: Expected Counts:

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 13 Example 15.1 Infections and Xylitol (cont) Chi-square Test Statistic:

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 14 Step 3: p-value of Chi-square Test p-value = probability the chi-square test statistic could have been as large or larger if the null hypothesis were true. Large test statistic => evidence of a relationship. So how large is enough to declare significance? Chi-square probability distribution used to find p-value. Degrees of freedom df = (Rows – 1)(Columns – 1) = (r – 1)(c – 1)

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 15 Chi-square Distributions Skewed to the right distributions. Minimum value is 0. Indexed by the degrees of freedom.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 16 Example 15.1 Infections and Xylitol (cont) Chi-square statistic was 6.69 df = (3-1)(2-1) = 2 p-value = 0.035

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 17 Finding the p-value from Table A.5: If value of statistic falls between two table entries, p-value is between values of p (column headings) for these two entries. If value of statistic is larger than entry in rightmost column (labeled p = 0.001), p-value is less than 0.001 (written as p < 0.001). If value of statistic is smaller than entry in leftmost column (labeled p = 0.50), p-value is greater than 0.50 (written as p > 0.50). Look in the corresponding “df” row of Table A.5. Scan across until you find where the statistic falls.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 18 Example 15.3 Table has three rows and three columns. The computed chi-square statistic is 8.12. Degrees of freedom are df = (3 – 1)(3 – 1) = 4. Finding the p-value: Scan the df = 4 row in Table A.5 and the value of 8.12 is between the entries 7.78 (p = 0.10) and 8.50 (p = 0.075). Thus, the p-value is between 0.075 and 0.10. 0.075 < p-value < 0.10

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 19 Step 4: Making a Decision Two equivalent rules: Reject H 0 when … p-value  0.05 Chi-square statistic is greater than the entry in the 0.05 column of Table A.5 (the critical value). Large test statistic => small p-value => evidence a real relationship exists in the population. Note: For 2  2 tables, a test statistic of 3.84 or larger is significant.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 20 Step 5: Reporting a Conclusion Ways to write “do not reject H 0 ” The relationship between smoking and drinking alcohol is not statistically significant. The proportions of smokers who never drink, drink occasionally, and drink often are not significantly different from the proportions of non-smokers who do so. There is insufficient evidence to conclude that there is a relationship in the population between smoking and drinking alcohol. Example: Testing whether there is a relationship between smoking (yes or no) and drinking alcohol (never, occasionally, often).

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 21 Step 5: Reporting a Conclusion Ways to write “reject H 0 ” There is a statistically significant relationship between smoking and drinking alcohol. The proportions of smokers who never drink, drink occasionally, and drink often are not the same as the proportions of non-smokers who do so. Smokers have significantly different drinking behavior than non-smokers. Example: Testing whether there is a relationship between smoking (yes or no) and drinking alcohol (never, occasionally, often).

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 22 Example 15.2 Making Friends (cont) Q: With whom do you find it easiest to make friend – opposite sex or same sex or no difference? df = (2 – 1)(3 – 1) = 2. Table A.5: value of 8.515 falls between the entries in the 0.025 column (7.38) and the 0.01 column (9.21). 0.01 < p-value < 0.025 There is statistically significant relationship at the 0.05 level. There appears to be a a difference in distribution of responses of men and women if the populations were asked this question.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 23 Supporting Analyses Description of row (or column) percents. Bar chart of counts or percents. Examination each cell’s “contribution to chi- square.” Cells with largest values have contributed most to significance of the relationship => deserve attention in any description of the relationship. Confidence intervals for important proportions or for differences between proportions. To learn about the specific nature of the relationship:

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 24 15.2 Analyzing 2  2 Tables Shortcut Formula: The test statistic formula is below, based on df = 1.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 25 Example 6.10 Randomly Pick S or Q (cont) College students asked: “Randomly choose one of the letters S or Q”, or “Randomly choose one of the letters Q or S”.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 26 Chi-Square Test or Z-Test for Difference in Two Proportions? Does it make a difference? If desired H a has no specific direction (two-sided), the two tests give exactly the same p-value. The squared value of the z-statistic equals the chi-square statistic. If desired H a has a direction (one-sided), the z-test should be used.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 27 Fisher’s Exact Test for 2  2 Tables Can be used for any 2  2 table, but most commonly used when necessary sample size conditions for using the z-test or the chi-square test are violated. Although computations are cumbersome, most statistical software programs include the Fisher’s Exact Test.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 28 15.3 Testing Hypotheses about One Categorical Variable: GOF Step 1: Determine the null and alternative hypotheses. H 0 : The probabilities for k categories are p 1, p 2,..., p k. H a : Not all probabilities specified in H 0 are correct. Note: Probabilities in the null hypothesis must sum to 1. Goodness of Fit (GOF) Test

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 29 Goodness of Fit (GOF) Test (cont) Step 2: Verify necessary data conditions, and if met, summarize the data into an appropriate test statistic. If at least 80% of the expected counts are greater than 5 and none are less than 1, compute where the expected count for the i th category is computed as np i. (Observed – Expected) 2 Expected

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 30 Goodness of Fit (GOF) Test (cont) Step 3: Assuming the null hypothesis is true, find the p-value. Use chi-square distribution with df = k – 1. Step 4: Decide whether or not the result is statistically significant based on the p-value. The result is statistically significant if the p-value  . Step 5: Report the conclusion in the context of the situation.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 31 Example 15.8 Pennsylvania Daily Number State lottery game: Three-digit number made by drawing a digit between 0 and 9 from each of three different containers. Focus = draws from the first container. If numbers randomly selected, each value would be equally likely to occur. H 0 : p = 1/10 for each of the 10 possible digits H a : Not H 0

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 32 Example 15.8 Daily Number (cont) Data: n = 500 days between 7/19/99 and 11/29/00

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 33 Example 15.8 Daily Number (cont) Chi-square goodness of fit statistic: From Table A.5: df = k – 1 = 10 – 1 = 9 p-value > 0.50 Result is not statistically significant; the null hypothesis is not rejected.