Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12.

Similar presentations


Presentation on theme: "© 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12."— Presentation transcript:

1 © 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12

2 © 2010 Pearson Prentice Hall. All rights reserved Section Goodness-of-Fit Test 12.1

3 © 2010 Pearson Prentice Hall. All rights reserved Objective Perform a goodness-of-fit test

4 © 2010 Pearson Prentice Hall. All rights reserved Characteristics of the Chi-Square Distribution 1.It is not symmetric.

5 © 2010 Pearson Prentice Hall. All rights reserved It is not symmetric. 2.The shape of the chi-square distribution depends on the degrees of freedom, just like Student’s t-distribution. Characteristics of the Chi-Square Distribution

6 © 2010 Pearson Prentice Hall. All rights reserved It is not symmetric. 2.The shape of the chi-square distribution depends on the degrees of freedom, just like Student’s t-distribution. 3.As the number of degrees of freedom increases, the chi-square distribution becomes more nearly symmetric. Characteristics of the Chi-Square Distribution

7 © 2010 Pearson Prentice Hall. All rights reserved It is not symmetric. 2.The shape of the chi-square distribution depends on the degrees of freedom, just like Student’s t-distribution. 3.As the number of degrees of freedom increases, the chi-square distribution becomes more nearly symmetric. 4.The values of  2 are nonnegative, i.e., the values of  2 are greater than or equal to 0. Characteristics of the Chi-Square Distribution

8 © 2010 Pearson Prentice Hall. All rights reserved

9 © 2010 Pearson Prentice Hall. All rights reserved A goodness-of-fit test is an inferential procedure used to determine whether a frequency distribution follows a specific distribution.

10 © 2010 Pearson Prentice Hall. All rights reserved Expected Counts Suppose that there are n independent trials of an experiment with k ≥ 3 mutually exclusive possible outcomes. Let p 1 represent the probability of observing the first outcome and E 1 represent the expected count of the first outcome; p 2 represent the probability of observing the second outcome and E 2 represent the expected count of the second outcome; and so on. The expected counts for each possible outcome are given by E i =  i = np i for i = 1, 2, …, k

11 © 2010 Pearson Prentice Hall. All rights reserved A sociologist wishes to determine whether the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in According to the United States Census Bureau, in 2000, 22.8% of grandparents have been responsible for their grandchildren less than 1 year; 23.9% of grandparents have been responsible for their grandchildren for 1 or 2 years; 17.6% of grandparents have been responsible for their grandchildren 3 or 4 years; and 35.7% of grandparents have been responsible for their grandchildren for 5 or more years. If the sociologist randomly selects 1,000 care-giving grandparents, compute the expected number within each category assuming the distribution has not changed from Parallel Example 1: Finding Expected Counts

12 © 2010 Pearson Prentice Hall. All rights reserved Step 1: The probabilities are the relative frequencies from the 2000 distribution: p <1yr = p 1-2yr = p 3-4yr = p ≥5yr = Solution

13 © 2010 Pearson Prentice Hall. All rights reserved Step 2: There are n=1,000 trials of the experiment so the expected counts are: E <1yr = np <1yr = 1000(0.228) = 228 E 1-2yr = np 1-2yr = 1000(0.239) = 239 E 3-4yr = np 3-4yr =1000(0.176) = 176 E ≥5yr = np ≥5yr = 1000(0.357) = 357 Solution

14 © 2010 Pearson Prentice Hall. All rights reserved Test Statistic for Goodness-of-Fit Tests Let O i represent the observed counts of category i, E i represent the expected counts of category i, k represent the number of categories, and n represent the number of independent trials of an experiment. Then the formula approximately follows the chi-square distribution with k-1 degrees of freedom, provided that 1.all expected frequencies are greater than or equal to 1 (all E i ≥ 1) and 2.no more than 20% of the expected frequencies are less than 5.

15 © 2010 Pearson Prentice Hall. All rights reserved CAUTION! Goodness-of-fit tests are used to test hypotheses regarding the distribution of a variable based on a single population. If you wish to compare two or more populations, you must use the tests for homogeneity presented in Section 12.2.

16 © 2010 Pearson Prentice Hall. All rights reserved Step 1: Determine the null and alternative hypotheses. H 0 : The random variable follows a certain distribution H 1 : The random variable does not follow a certain distribution The Goodness-of-Fit Test To test the hypotheses regarding a distribution, we use the steps that follow.

17 © 2010 Pearson Prentice Hall. All rights reserved Step 2: Decide on a level of significance, , depending on the seriousness of making a Type I error.

18 © 2010 Pearson Prentice Hall. All rights reserved Step 3: a)Calculate the expected counts for each of the k categories. The expected counts are E i =np i for i = 1, 2, …, k where n is the number of trials and p i is the probability of the ith category, assuming that the null hypothesis is true.

19 © 2010 Pearson Prentice Hall. All rights reserved Step 3: b)Verify that the requirements for the goodness- of-fit test are satisfied. 1.All expected counts are greater than or equal to 1 (all E i ≥ 1). 2.No more than 20% of the expected counts are less than 5. c) Compute the test statistic: Note: O i is the observed count for the ith category.

20 © 2010 Pearson Prentice Hall. All rights reserved CAUTION! If the requirements in Step 3(b) are not satisfied, one option is to combine two or more of the low- frequency categories into a single category.

21 © 2010 Pearson Prentice Hall. All rights reserved Step 4: Determine the critical value. All goodness- of-fit tests are right-tailed tests, so the critical value is with k-1 degrees of freedom. Classical Approach

22 © 2010 Pearson Prentice Hall. All rights reserved Step 5: Compare the critical value to the test statistic. If reject the null hypothesis. Classical Approach

23 © 2010 Pearson Prentice Hall. All rights reserved Step 4: Use Table VII to obtain an approximate P-value by determining the area under the chi-square distribution with k-1 degrees of freedom to the right of the test statistic. P-Value Approach

24 © 2010 Pearson Prentice Hall. All rights reserved Step 5: If the P-value < , reject the null hypothesis. P-Value Approach

25 © 2010 Pearson Prentice Hall. All rights reserved Step 6: State the conclusion.

26 © 2010 Pearson Prentice Hall. All rights reserved A sociologist wishes to determine whether the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in According to the United States Census Bureau, in 2000, 22.8% of grandparents have been responsible for their grandchildren less than 1 year; 23.9% of grandparents have been responsible for their grandchildren for 1 or 2 years; 17.6% of grandparents have been responsible for their grandchildren 3 or 4 years; and 35.7% of grandparents have been responsible for their grandchildren for 5 or more years. The sociologist randomly selects 1,000 care-giving grandparents and obtains the following data. Parallel Example 2: Conducting a Goodness-of -Fit Test

27 © 2010 Pearson Prentice Hall. All rights reserved Test the claim that the distribution is different today than it was in 2000 at the  = 0.05 level of significance.

28 © 2010 Pearson Prentice Hall. All rights reserved Step 1: We want to know if the distribution today is different than it was in The hypotheses are then: H 0 : The distribution for the number of years care-giving grandparents are responsible for their grandchildren is the same today as it was in 2000 H 1 : The distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000 Solution

29 © 2010 Pearson Prentice Hall. All rights reserved Step 2: The level of significance is  =0.05. Step 3: (a) The expected counts were computed in Example 1. Solution Number of Years Observed Counts Expected Counts < ≥

30 © 2010 Pearson Prentice Hall. All rights reserved Step 3: (b)Since all expected counts are greater than or equal to 5, the requirements for the goodness-of-fit test are satisfied. (c)The test statistic is Solution

31 © 2010 Pearson Prentice Hall. All rights reserved Step 4: There are k = 4 categories, so we find the critical value using 4-1=3 degrees of freedom. The critical value is Solution: Classical Approach

32 © 2010 Pearson Prentice Hall. All rights reserved Step 5: Since the test statistic, is less than the critical value, we fail to reject the null hypothesis. Solution: Classical Approach

33 © 2010 Pearson Prentice Hall. All rights reserved Step 4: There are k = 4 categories. The P-value is the area under the chi-square distribution with 4-1=3 degrees of freedom to the right of. Thus, P-value ≈ Solution: P-Value Approach

34 © 2010 Pearson Prentice Hall. All rights reserved Step 5: Since the P-value ≈ 0.09 is greater than the level of significance  = 0.05, we fail to reject the null hypothesis. Solution: P-Value Approach

35 © 2010 Pearson Prentice Hall. All rights reserved Step 6: There is insufficient evidence to conclude that the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000 at the  = 0.05 level of significance. Solution

36 © 2010 Pearson Prentice Hall. All rights reserved Section Tests for Independence and the Homogeneity of Proportions 12.2

37 © 2010 Pearson Prentice Hall. All rights reserved Perform a test for independence 2.Perform a test for homogeneity of proportions Objectives

38 © 2010 Pearson Prentice Hall. All rights reserved Objective 1 Perform a Test for Independence

39 © 2010 Pearson Prentice Hall. All rights reserved The chi-square test for independence is used to determine whether there is an association between a row variable and column variable in a contingency table constructed from sample data. The null hypothesis is that the variables are not associated; in other words, they are independent. The alternative hypothesis is that the variables are associated, or dependent.

40 © 2010 Pearson Prentice Hall. All rights reserved “In Other Words” In a chi-square independence test, the null hypothesis is always H 0 : The variables are independent The alternative hypothesis is always H 0 : The variables are not independent

41 © 2010 Pearson Prentice Hall. All rights reserved The idea behind testing these types of claims is to compare actual counts to the counts we would expect if the null hypothesis were true (if the variables are independent). If a significant difference between the actual counts and expected counts exists, we would take this as evidence against the null hypothesis.

42 © 2010 Pearson Prentice Hall. All rights reserved If two events are independent, then P(A and B) = P(A)P(B) We can use the Multiplication Principle for independent events to obtain the expected proportion of observations within each cell under the assumption of independence and multiply this result by n, the sample size, in order to obtain the expected count within each cell.

43 © 2010 Pearson Prentice Hall. All rights reserved In a poll, 883 males and 893 females were asked “If you could have only one of the following, which would you pick: money, health, or love?” Their responses are presented in the table below. Determine the expected counts within each cell assuming that gender and response are independent. Source: Based on a Fox News Poll conducted in January, 1999 Parallel Example 1: Determining the Expected Counts in a Test for Independence

44 © 2010 Pearson Prentice Hall. All rights reserved Step 1: We first compute the row and column totals: Solution MoneyHealthLoveRow Totals Men Women Column totals

45 © 2010 Pearson Prentice Hall. All rights reserved Step 2: Next compute the relative marginal frequencies for the row variable and column variable: Solution MoneyHealthLoveRelative Frequency Men /1776 ≈ Women /1776 ≈ Relative Frequency 128/1776 ≈ /1776 ≈ /1776 ≈

46 © 2010 Pearson Prentice Hall. All rights reserved Step 3: Assuming gender and response are independent, we use the Multiplication Rule for Independent Events to compute the proportion of observations we would expect in each cell. Solution MoneyHealthLove Men Women

47 © 2010 Pearson Prentice Hall. All rights reserved Step 4: We multiply the expected proportions from step 3 by 1776, the sample size, to obtain the expected counts under the assumption of independence. Solution MoneyHealthLove Men1776(0.0358) ≈ (0.2855) ≈ (0.1758) ≈ Wome n 1776(0.0362) ≈ (0.2888) ≈ (0.1778) ≈

48 © 2010 Pearson Prentice Hall. All rights reserved Expected Frequencies in a Chi-Square Test for Independence To find the expected frequencies in a cell when performing a chi-square independence test, multiply the row total of the row containing the cell by the column total of the column containing the cell and divide this result by the table total. That is,

49 © 2010 Pearson Prentice Hall. All rights reserved Test Statistic for the Test of Independence Let O i represent the observed number of counts in the ith cell and E i represent the expected number of counts in the ith cell. Then approximately follows the chi-square distribution with (r-1)(c-1) degrees of freedom, where r is the number of rows and c is the number of columns in the contingency table, provided that (1) all expected frequencies are greater than or equal to 1 and (2) no more than 20% of the expected frequencies are less than 5.

50 © 2010 Pearson Prentice Hall. All rights reserved Step 1: Determine the null and alternative hypotheses. H 0 : The row variable and column variable are independent. H 1 : The row variable and column variables are dependent. Chi-Square Test for Independence To test the association (or independence of) two variables in a contingency table:

51 © 2010 Pearson Prentice Hall. All rights reserved Step 2: Choose a level of significance, , depending on the seriousness of making a Type I error.

52 © 2010 Pearson Prentice Hall. All rights reserved Step 3: a)Calculate the expected frequencies (counts) for each cell in the contingency table. b)Verify that the requirements for the chi- square test for independence are satisfied: 1.All expected frequencies are greater than or equal to 1 (all E i ≥ 1). 2.No more than 20% of the expected frequencies are less than 5.

53 © 2010 Pearson Prentice Hall. All rights reserved Step 3: c) Compute the test statistic: Note: O i is the observed count for the ith category.

54 © 2010 Pearson Prentice Hall. All rights reserved Step 4: Determine the critical value. All chi-square tests for independence are right-tailed tests, so the critical value is with (r-1)(c-1) degrees of freedom, where r is the number of rows and c is the number of columns in the contingency table. Classical Approach

55 © 2010 Pearson Prentice Hall. All rights reserved

56 © 2010 Pearson Prentice Hall. All rights reserved Step 5: Compare the critical value to the test statistic. If reject the null hypothesis. Classical Approach

57 © 2010 Pearson Prentice Hall. All rights reserved Step 4: Use Table VII to determine an approximate P- value by determining the area under the chi- square distribution with (r-1)(c-1) degrees of freedom to the right of the test statistic. P-Value Approach

58 © 2010 Pearson Prentice Hall. All rights reserved Step 5: If the P-value < , reject the null hypothesis. P-Value Approach

59 © 2010 Pearson Prentice Hall. All rights reserved Step 6: State the conclusion.

60 © 2010 Pearson Prentice Hall. All rights reserved In a poll, 883 males and 893 females were asked “If you could have only one of the following, which would you pick: money, health, or love?” Their responses are presented in the table below. Test the claim that gender and response are independent at the  = 0.05 level of significance. Source: Based on a Fox News Poll conducted in January, 1999 Parallel Example 2: Performing a Chi-Square Test for Independence

61 © 2010 Pearson Prentice Hall. All rights reserved Step 1: We want to know whether gender and response are dependent or independent so the hypotheses are: H 0 : gender and response are independent H 1 : gender and response are dependent Step 2: The level of significance is  =0.05. Solution

62 © 2010 Pearson Prentice Hall. All rights reserved Step 3: (a) The expected frequencies were computed in Example 1 and are given in parentheses in the table below, along with the observed frequencies. Solution MoneyHealthLove Men82 ( ) 446 ( ) 355 ( ) Women46 ( ) 574 ( ) 273 ( )

63 © 2010 Pearson Prentice Hall. All rights reserved Step 3: (b)Since none of the expected frequencies are less than 5, the requirements for the goodness-of-fit test are satisfied. (c)The test statistic is Solution

64 © 2010 Pearson Prentice Hall. All rights reserved Step 4: There are r = 2 rows and c =3 columns, so we find the critical value using (2-1)(3-1) = 2 degrees of freedom. The critical value is. Solution: Classical Approach

65 © 2010 Pearson Prentice Hall. All rights reserved Step 5: Since the test statistic, is greater than the critical value, we reject the null hypothesis. Solution: Classical Approach

66 © 2010 Pearson Prentice Hall. All rights reserved Step 4: There are r = 2 rows and c =3 columns so we find the P-value using (2-1)(3-1) = 2 degrees of freedom. The P-value is the area under the chi-square distribution with 2 degrees of freedom to the right of which is approximately 0. Solution: P-Value Approach

67 © 2010 Pearson Prentice Hall. All rights reserved Step 5: Since the P-value is less than the level of significance  = 0.05, we reject the null hypothesis. Solution: P-Value Approach

68 © 2010 Pearson Prentice Hall. All rights reserved Step 6: There is sufficient evidence to conclude that gender and response are dependent at the  = 0.05 level of significance. Solution

69 © 2010 Pearson Prentice Hall. All rights reserved To see the relation between response and gender, we draw bar graphs of the conditional distributions of response by gender. Recall that a conditional distribution lists the relative frequency of each category of a variable, given a specific value of the other variable in a contingency table.

70 © 2010 Pearson Prentice Hall. All rights reserved Find the conditional distribution of response by gender for the data from the previous example, reproduced below. Source: Based on a Fox News Poll conducted in January, 1999 Parallel Example 3: Constructing a Conditional Distribution and Bar Graph

71 © 2010 Pearson Prentice Hall. All rights reserved We first compute the conditional distribution of response by gender. Solution MoneyHealthLove Men82/883 ≈ /883 ≈ /883 ≈ Women46/893 ≈ /893 ≈ /893 ≈

72 © 2010 Pearson Prentice Hall. All rights reserved Solution

73 © 2010 Pearson Prentice Hall. All rights reserved Objective 2 Perform a Test for Homogeneity of Proportions

74 © 2010 Pearson Prentice Hall. All rights reserved In a chi-square test for homogeneity of proportions, we test whether different populations have the same proportion of individuals with some characteristic.

75 © 2010 Pearson Prentice Hall. All rights reserved The procedures for performing a test of homogeneity are identical to those for a test of independence.

76 © 2010 Pearson Prentice Hall. All rights reserved The following question was asked of a random sample of individuals in 1992, 2002, and 2008: “Would you tell me if you feel being a teacher is an occupation of very great prestige?” The results of the survey are presented below: Test the claim that the proportion of individuals that feel being a teacher is an occupation of very great prestige is the same for each year at the  = 0.01 level of significance. Source: The Harris Poll Parallel Example 5: A Test for Homogeneity of Proportions Yes No

77 © 2010 Pearson Prentice Hall. All rights reserved Step 1: The null hypothesis is a statement of “no difference” so the proportions for each year who feel that being a teacher is an occupation of very great prestige are equal. We state the hypotheses as follows: H 0 : p 1 = p 2 = p 3 H 1 : At least one of the proportions is different from the others. Step 2: The level of significance is  =0.01. Solution

78 © 2010 Pearson Prentice Hall. All rights reserved Step 3: (a) The expected frequencies are found by multiplying the appropriate row and column totals and then dividing by the total sample size. They are given in parentheses in the table below, along with the observed frequencies. Solution Yes 418 ( ) 479 ( ) 525 ( ) No 602 ( ) 541 ( ) 485 ( )

79 © 2010 Pearson Prentice Hall. All rights reserved Step 3: (b)Since none of the expected frequencies are less than 5, the requirements are satisfied. (c)The test statistic is Solution

80 © 2010 Pearson Prentice Hall. All rights reserved Step 4: There are r = 2 rows and c =3 columns, so we find the critical value using (2-1)(3-1) = 2 degrees of freedom. The critical value is. Solution: Classical Approach

81 © 2010 Pearson Prentice Hall. All rights reserved Step 5: Since the test statistic, is greater than the critical value, we reject the null hypothesis. Solution: Classical Approach

82 © 2010 Pearson Prentice Hall. All rights reserved Step 4: There are r = 2 rows and c =3 columns so we find the P-value using (2-1)(3-1) = 2 degrees of freedom. The P-value is the area under the chi-square distribution with 2 degrees of freedom to the right of which is approximately 0. Solution: P-Value Approach

83 © 2010 Pearson Prentice Hall. All rights reserved Step 5: Since the P-value is less than the level of significance  = 0.01, we reject the null hypothesis. Solution: P-Value Approach

84 © 2010 Pearson Prentice Hall. All rights reserved Step 6: There is sufficient evidence to reject the null hypothesis at the  = 0.01 level of significance. We conclude that the proportion of individuals who believe that teaching is a very prestigious career is different for at least one of the three years. Solution


Download ppt "© 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12."

Similar presentations


Ads by Google