Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.

Similar presentations


Presentation on theme: "Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test."— Presentation transcript:

1 Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test

2 Chi-Square Tests for Univariate Categorical Data

3 One way frequency table – univariate categorical data are most conveniently summarized CashCreditExchangeRefused Frequency34183117

4 H a : H 0 is not true, so at least one of the true category proportions differs from the corresponding hypothesized value.

5 Example A number of psychological studies have considered the relationship between various deviant behaviors and other variables, such as lunar phase. An article focused on the existence of any relationship between date of patient admission for specified treatment and patient’s birthday. Admission date was partitioned into four categories according to how close it was to the patient’s birthday:

6 1.Within 7 days of birthday 2.Between 8 and 30 days, inclusive, from the birthday 3.Between 31 and 90 days, inclusive, from the birthday 4.More than 90 days from the birthday

7 Let π 1, π 2, π 3, and π 4 denote the true proportions in categories 1, 2, 3, and 4, respectively. If there is no relationship between admission date and birthday, then, because there are 15 days included in the first category (from 7 days before the patient’s birthday to 7 days after, including of course, the birthday itself).

8 The hypotheses of interest are then H 0 : π 1 =.041, π 2 =.126, π 3 =.329, π 4 =.504 H a : H 0 is not true

9 The cited article gave data for n = 200 patients admitted for alcoholism treatment. If H 0 is true, the expected counts are

10 Category 1 2 3 4 Observed11 24 69 96 Expected8.2 25.2 65.8 100.8

11

12 Example We use the same data from previous example to test the hypothesis that admission date is unrelated to birthday. Let’s use a.05 significance level and the nine-step hypothesis-testing procedure.

13 1.Let π 1, π 2, π 3, and π 4 denote the proportions of all admissions for treatment of alcoholism falling in the four categories. 2.H 0 : π 1 =.041, π 2 =.126, π 3 =.329, π 4 =.504 3.H a : H 0 is not true. 4.Significance level: α =.05

14 6. Assumptions: The expected cell counts (from Example 12.1) are 8.2, 25.2, 65.8, and 100.8, all of which are greater than 5. The article did not indicate how the patients were selected. We can proceed with the chi-square test if it is reasonable to assume that the 200 patients in the sample can be regarded as a random sample of patients admitted for treatment of alcoholism.

15 8. P-value: The P-value is based on a chi- square distribution with df = 4 – 1 = 3. The computed value of X 2 is smaller than 6.25 (the smallest entry in the df = 3 column), so P-value >.10. 9. Conclusion: Because P-value > α, H 0 cannot be rejected. There is not sufficient evidence to conclude that date admitted for treatment and birthday are related.

16 Example Does the color of a car influence the chance that it will be stolen? It was reported the following information for a random sample of 830 stolen vehicles: 140 were white, 100 were blue, 270 were red, 230 were black, and 90 were other colors. We use X 2 goodness-of-fit test and a significance level of.01 to test the hypothesis that proportion stolen are identical to population color proportions.

17 Suppose that it is known that 15% of all cars are white, 15% are blue, 35% red, 30% are black, and 5% are other colors. If these same population color proportions hold for stolen cars, the expected counts are:

18 Expected for white = 830(0.15) = 124.5 Expected for blue = 830(0.15) = 124.5 Expected for red = 830(0.35) = 290.5 Expected for black = 830(0.30) = 249.0 Expected for other = 830(0.05) = 41.5

19 Observed and Expected Counts CategoryColorObserved Count Expected Count 1White140124.5 2Blue100124.5 3Red270290.5 4Black230249.0 5Other9041.5

20 1.Let π 1, π 2,…, π 5 denote the true proportions of stolen cars that fall into the five color categories. 2.H 0 : π 1 =.15, π 2 =.15, π 3 =.35, π 4 =.30, π 5 =.05 3.H a : H 0 is not true 4.Significance level: α =.01

21 6. Assumptions: The sample was a random sample of stolen vehicles. All expected counts are greater than 5, so the sample size is large enough to use the chi-square test.

22 8. P-value: All expected counts exceed 5, so the P-value can be based on a chi-square distribution with df = 5 – 1 = 4. The computed value is larger than 18.46, the largest value in the df = 4 column so P-value <.001 9. Conclusion: Because P-value ≤ α, H 0 is rejected. There is convincing evidence that at least one of the color proportions for stolen cars differs from the corresponding proportion for all cars.


Download ppt "Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test."

Similar presentations


Ads by Google