# 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Goodness-of-Fit Tests.

## Presentation on theme: "1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Goodness-of-Fit Tests."— Presentation transcript:

1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Goodness-of-Fit Tests

2 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Univariate Categorical Data Univariate categorical data is best summarized in a one-way frequency table. For example, consider the following observations of sample of faculty status for faculty in a large university system.

3 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Univariate Categorical Data A local newsperson might be interested in testing hypotheses about the proportion of the population that fall in each of the categories. For example, the newsperson might want to test to see if the five categories occur with equal frequency throughout the whole university system. To deal with this type of question we need to establish some notation.

4 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Notation k = number of categories of a categorical variable  1 = true proportion for category 1  2 = true proportion for category 2    k = true proportion for category k (note:  1 +  2 +  +  k = 1)

5 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Hypotheses H 0 :  1 = hypothesized proportion for category 1  2 = hypothesized proportion for category 2    k = hypothesized proportion for category k H a :H 0 is not true, so at least one of the true category proportions differs from the corresponding hypothesized value.

6 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Expected Counts For each category, the expected count for that category is the product of the total number of observations with the hypothesized proportion for that category.

7 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Expected Counts - Example Consider the sample of faculty from a large university system and recall that the newsperson wanted to test to see if each of the groups occurred with equal frequency.

8 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Goodness-of-fit statistic,  2 The value of the  2 statistic is the sum of these terms. The goodness-of-fit statistic,  2, results from first computing the quantity for each cell.

9 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chi-square distributions

10 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Upper-tail Areas for Chi-square Distributions

11 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Goodness-of-Fit Test Procedure Hypotheses: H 0 :  1 = hypothesized proportion for category 1  2 = hypothesized proportion for category 2    k = hypothesized proportion for category k H a :H 0 is not true Test statistic:

12 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Goodness-of-Fit Test Procedure P-values: When H 0 is true and all expected counts are at least 5,  2 has approximately a chi-square distribution with df = k-1. Therefore, the P-value associated with the computed test statistic value is the area to the right of  2 under the df = k-1 chi-square curve.

13 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Goodness-of-Fit Test Procedure Assumptions: 1.Observed cell counts are based on a random sample. 2.The sample size is large. The sample size is large enough for the chi-squared test to be appropriate as long as every expected count is at least 5.

14 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example Consider the newsperson’s desire to determine if the faculty of a large university system were equally distributed. Let us test this hypothesis at a significance level of 0.05. Let  1,  2,  3,  4, and  5 denote the proportions of all faculty in this university system that are full professors, associate professors, assistant professors, instructors and adjunct/part time respectively. H 0 :  1 = 0.2,  2 = 0.2,  3 = 0.2,  4 = 0.2,  5 = 0.2 H a : H 0 is not true

15 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example Significance level:  = 0.05 Assumptions: As we saw in an earlier slide, the expected counts were all 30.8 which is greater than 5. Although we do not know for sure how the sample was obtained for the purposes of this example, we shall assume selection procedure generated a random sample. Test statistic:

16 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example Calculation: recall

17 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Example P-value: The P-value is based on a chi-squared distribution with df = 5 - 1 = 4. The computed value of  2, 7.56 is smaller than 7.77, the lowest value of  2 in the table for df = 4, so that the P-value is greater than 0.100. Conclusion: Since the P-value > 0.05 = , H 0 cannot be rejected. There is not sufficient evidence to refute the claim that the proportion of faculty in each of the different categories is the same.