© 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12.

© 2010 Pearson Prentice Hall. All rights reserved 12-106 1.It is not symmetric. 2.The shape of the chi-square distribution depends on the degrees of freedom, just like Student’s t-distribution. Characteristics of the Chi-Square Distribution

© 2010 Pearson Prentice Hall. All rights reserved 12-107 1.It is not symmetric. 2.The shape of the chi-square distribution depends on the degrees of freedom, just like Student’s t-distribution. 3.As the number of degrees of freedom increases, the chi-square distribution becomes more nearly symmetric. Characteristics of the Chi-Square Distribution

© 2010 Pearson Prentice Hall. All rights reserved 12-108 1.It is not symmetric. 2.The shape of the chi-square distribution depends on the degrees of freedom, just like Student’s t-distribution. 3.As the number of degrees of freedom increases, the chi-square distribution becomes more nearly symmetric. 4.The values of  2 are nonnegative, i.e., the values of  2 are greater than or equal to 0. Characteristics of the Chi-Square Distribution

© 2010 Pearson Prentice Hall. All rights reserved 12-111 Expected Counts Suppose that there are n independent trials of an experiment with k ≥ 3 mutually exclusive possible outcomes. Let p 1 represent the probability of observing the first outcome and E 1 represent the expected count of the first outcome; p 2 represent the probability of observing the second outcome and E 2 represent the expected count of the second outcome; and so on. The expected counts for each possible outcome are given by E i =  i = np i for i = 1, 2, …, k

© 2010 Pearson Prentice Hall. All rights reserved 12-112 A sociologist wishes to determine whether the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000. According to the United States Census Bureau, in 2000, 22.8% of grandparents have been responsible for their grandchildren less than 1 year; 23.9% of grandparents have been responsible for their grandchildren for 1 or 2 years; 17.6% of grandparents have been responsible for their grandchildren 3 or 4 years; and 35.7% of grandparents have been responsible for their grandchildren for 5 or more years. If the sociologist randomly selects 1,000 care-giving grandparents, compute the expected number within each category assuming the distribution has not changed from 2000. Parallel Example 1: Finding Expected Counts

© 2010 Pearson Prentice Hall. All rights reserved 12-113 Step 1: The probabilities are the relative frequencies from the 2000 distribution: p <1yr = 0.228 p 1-2yr = 0.239 p 3-4yr = 0.176 p ≥5yr = 0.357 Solution

© 2010 Pearson Prentice Hall. All rights reserved 12-114 Step 2: There are n=1,000 trials of the experiment so the expected counts are: E <1yr = np <1yr = 1000(0.228) = 228 E 1-2yr = np 1-2yr = 1000(0.239) = 239 E 3-4yr = np 3-4yr =1000(0.176) = 176 E ≥5yr = np ≥5yr = 1000(0.357) = 357 Solution

© 2010 Pearson Prentice Hall. All rights reserved 12-115 Test Statistic for Goodness-of-Fit Tests Let O i represent the observed counts of category i, E i represent the expected counts of category i, k represent the number of categories, and n represent the number of independent trials of an experiment. Then the formula approximately follows the chi-square distribution with k-1 degrees of freedom, provided that 1.all expected frequencies are greater than or equal to 1 (all E i ≥ 1) and 2.no more than 20% of the expected frequencies are less than 5.

© 2010 Pearson Prentice Hall. All rights reserved 12-116 CAUTION! Goodness-of-fit tests are used to test hypotheses regarding the distribution of a variable based on a single population. If you wish to compare two or more populations, you must use the tests for homogeneity presented in Section 12.2.

© 2010 Pearson Prentice Hall. All rights reserved 12-117 Step 1: Determine the null and alternative hypotheses. H 0 : The random variable follows a certain distribution H 1 : The random variable does not follow a certain distribution The Goodness-of-Fit Test To test the hypotheses regarding a distribution, we use the steps that follow.

© 2010 Pearson Prentice Hall. All rights reserved 12-119 Step 3: a)Calculate the expected counts for each of the k categories. The expected counts are E i =np i for i = 1, 2, …, k where n is the number of trials and p i is the probability of the ith category, assuming that the null hypothesis is true.

© 2010 Pearson Prentice Hall. All rights reserved 12-120 Step 3: b)Verify that the requirements for the goodness- of-fit test are satisfied. 1.All expected counts are greater than or equal to 1 (all E i ≥ 1). 2.No more than 20% of the expected counts are less than 5. c) Compute the test statistic: Note: O i is the observed count for the ith category.

© 2010 Pearson Prentice Hall. All rights reserved 12-121 CAUTION! If the requirements in Step 3(b) are not satisfied, one option is to combine two or more of the low- frequency categories into a single category.

© 2010 Pearson Prentice Hall. All rights reserved 12-122 Step 4: Determine the critical value. All goodness- of-fit tests are right-tailed tests, so the critical value is with k-1 degrees of freedom. Classical Approach

© 2010 Pearson Prentice Hall. All rights reserved 12-124 Step 4: Use Table VII to obtain an approximate P-value by determining the area under the chi-square distribution with k-1 degrees of freedom to the right of the test statistic. P-Value Approach

© 2010 Pearson Prentice Hall. All rights reserved 12-127 A sociologist wishes to determine whether the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000. According to the United States Census Bureau, in 2000, 22.8% of grandparents have been responsible for their grandchildren less than 1 year; 23.9% of grandparents have been responsible for their grandchildren for 1 or 2 years; 17.6% of grandparents have been responsible for their grandchildren 3 or 4 years; and 35.7% of grandparents have been responsible for their grandchildren for 5 or more years. The sociologist randomly selects 1,000 care-giving grandparents and obtains the following data. Parallel Example 2: Conducting a Goodness-of -Fit Test

© 2010 Pearson Prentice Hall. All rights reserved 12-129 Step 1: We want to know if the distribution today is different than it was in 2000. The hypotheses are then: H 0 : The distribution for the number of years care-giving grandparents are responsible for their grandchildren is the same today as it was in 2000 H 1 : The distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000 Solution

© 2010 Pearson Prentice Hall. All rights reserved 12-130 Step 2: The level of significance is  =0.05. Step 3: (a) The expected counts were computed in Example 1. Solution Number of Years Observed Counts Expected Counts <1252228 1-2255239 3-4162176 ≥5331357

© 2010 Pearson Prentice Hall. All rights reserved 12-131 Step 3: (b)Since all expected counts are greater than or equal to 5, the requirements for the goodness-of-fit test are satisfied. (c)The test statistic is Solution

© 2010 Pearson Prentice Hall. All rights reserved 12-132 Step 4: There are k = 4 categories, so we find the critical value using 4-1=3 degrees of freedom. The critical value is Solution: Classical Approach

© 2010 Pearson Prentice Hall. All rights reserved 12-134 Step 4: There are k = 4 categories. The P-value is the area under the chi-square distribution with 4-1=3 degrees of freedom to the right of. Thus, P-value ≈ 0.09. Solution: P-Value Approach

© 2010 Pearson Prentice Hall. All rights reserved 12-135 Step 5: Since the P-value ≈ 0.09 is greater than the level of significance  = 0.05, we fail to reject the null hypothesis. Solution: P-Value Approach

© 2010 Pearson Prentice Hall. All rights reserved 12-136 Step 6: There is insufficient evidence to conclude that the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000 at the  = 0.05 level of significance. Solution

© 2010 Pearson Prentice Hall. All rights reserved 12-140 The chi-square test for independence is used to determine whether there is an association between a row variable and column variable in a contingency table constructed from sample data. The null hypothesis is that the variables are not associated; in other words, they are independent. The alternative hypothesis is that the variables are associated, or dependent.

© 2010 Pearson Prentice Hall. All rights reserved 12-141 “In Other Words” In a chi-square independence test, the null hypothesis is always H 0 : The variables are independent The alternative hypothesis is always H 0 : The variables are not independent

© 2010 Pearson Prentice Hall. All rights reserved 12-142 The idea behind testing these types of claims is to compare actual counts to the counts we would expect if the null hypothesis were true (if the variables are independent). If a significant difference between the actual counts and expected counts exists, we would take this as evidence against the null hypothesis.

© 2010 Pearson Prentice Hall. All rights reserved 12-143 If two events are independent, then P(A and B) = P(A)P(B) We can use the Multiplication Principle for independent events to obtain the expected proportion of observations within each cell under the assumption of independence and multiply this result by n, the sample size, in order to obtain the expected count within each cell.

© 2010 Pearson Prentice Hall. All rights reserved 12-144 In a poll, 883 males and 893 females were asked “If you could have only one of the following, which would you pick: money, health, or love?” Their responses are presented in the table below. Determine the expected counts within each cell assuming that gender and response are independent. Source: Based on a Fox News Poll conducted in January, 1999 Parallel Example 1: Determining the Expected Counts in a Test for Independence

© 2010 Pearson Prentice Hall. All rights reserved 12-145 Step 1: We first compute the row and column totals: Solution MoneyHealthLoveRow Totals Men82446355883 Women46574273893 Column totals12810206281776

© 2010 Pearson Prentice Hall. All rights reserved 12-146 Step 2: Next compute the relative marginal frequencies for the row variable and column variable: Solution MoneyHealthLoveRelative Frequency Men82446355883/1776 ≈ 0.4972 Women46574273893/1776 ≈0.5028 Relative Frequency 128/1776 ≈0.0721 1020/1776 ≈0.5743 628/1776 ≈0.35361

© 2010 Pearson Prentice Hall. All rights reserved 12-147 Step 3: Assuming gender and response are independent, we use the Multiplication Rule for Independent Events to compute the proportion of observations we would expect in each cell. Solution MoneyHealthLove Men0.03580.28550.1758 Women0.03620.28880.1778

© 2010 Pearson Prentice Hall. All rights reserved 12-148 Step 4: We multiply the expected proportions from step 3 by 1776, the sample size, to obtain the expected counts under the assumption of independence. Solution MoneyHealthLove Men1776(0.0358) ≈ 63.5808 1776(0.2855) ≈ 507.048 1776(0.1758) ≈ 312.2208 Wome n 1776(0.0362) ≈ 64.2912 1776(0.2888) ≈ 512.9088 1776(0.1778) ≈ 315.7728

© 2010 Pearson Prentice Hall. All rights reserved 12-149 Expected Frequencies in a Chi-Square Test for Independence To find the expected frequencies in a cell when performing a chi-square independence test, multiply the row total of the row containing the cell by the column total of the column containing the cell and divide this result by the table total. That is,

© 2010 Pearson Prentice Hall. All rights reserved 12-150 Test Statistic for the Test of Independence Let O i represent the observed number of counts in the ith cell and E i represent the expected number of counts in the ith cell. Then approximately follows the chi-square distribution with (r-1)(c-1) degrees of freedom, where r is the number of rows and c is the number of columns in the contingency table, provided that (1) all expected frequencies are greater than or equal to 1 and (2) no more than 20% of the expected frequencies are less than 5.

© 2010 Pearson Prentice Hall. All rights reserved 12-151 Step 1: Determine the null and alternative hypotheses. H 0 : The row variable and column variable are independent. H 1 : The row variable and column variables are dependent. Chi-Square Test for Independence To test the association (or independence of) two variables in a contingency table:

© 2010 Pearson Prentice Hall. All rights reserved 12-153 Step 3: a)Calculate the expected frequencies (counts) for each cell in the contingency table. b)Verify that the requirements for the chi- square test for independence are satisfied: 1.All expected frequencies are greater than or equal to 1 (all E i ≥ 1). 2.No more than 20% of the expected frequencies are less than 5.

© 2010 Pearson Prentice Hall. All rights reserved 12-155 Step 4: Determine the critical value. All chi-square tests for independence are right-tailed tests, so the critical value is with (r-1)(c-1) degrees of freedom, where r is the number of rows and c is the number of columns in the contingency table. Classical Approach

© 2010 Pearson Prentice Hall. All rights reserved 12-158 Step 4: Use Table VII to determine an approximate P- value by determining the area under the chi- square distribution with (r-1)(c-1) degrees of freedom to the right of the test statistic. P-Value Approach

© 2010 Pearson Prentice Hall. All rights reserved 12-161 In a poll, 883 males and 893 females were asked “If you could have only one of the following, which would you pick: money, health, or love?” Their responses are presented in the table below. Test the claim that gender and response are independent at the  = 0.05 level of significance. Source: Based on a Fox News Poll conducted in January, 1999 Parallel Example 2: Performing a Chi-Square Test for Independence

© 2010 Pearson Prentice Hall. All rights reserved 12-162 Step 1: We want to know whether gender and response are dependent or independent so the hypotheses are: H 0 : gender and response are independent H 1 : gender and response are dependent Step 2: The level of significance is  =0.05. Solution

© 2010 Pearson Prentice Hall. All rights reserved 12-163 Step 3: (a) The expected frequencies were computed in Example 1 and are given in parentheses in the table below, along with the observed frequencies. Solution MoneyHealthLove Men82 (63.5808) 446 (507.048) 355 (312.2208) Women46 (64.2912) 574 (512.9088) 273 (315.7728)

© 2010 Pearson Prentice Hall. All rights reserved 12-164 Step 3: (b)Since none of the expected frequencies are less than 5, the requirements for the goodness-of-fit test are satisfied. (c)The test statistic is Solution

© 2010 Pearson Prentice Hall. All rights reserved 12-165 Step 4: There are r = 2 rows and c =3 columns, so we find the critical value using (2-1)(3-1) = 2 degrees of freedom. The critical value is. Solution: Classical Approach

© 2010 Pearson Prentice Hall. All rights reserved 12-167 Step 4: There are r = 2 rows and c =3 columns so we find the P-value using (2-1)(3-1) = 2 degrees of freedom. The P-value is the area under the chi-square distribution with 2 degrees of freedom to the right of which is approximately 0. Solution: P-Value Approach

© 2010 Pearson Prentice Hall. All rights reserved 12-170 To see the relation between response and gender, we draw bar graphs of the conditional distributions of response by gender. Recall that a conditional distribution lists the relative frequency of each category of a variable, given a specific value of the other variable in a contingency table.

© 2010 Pearson Prentice Hall. All rights reserved 12-171 Find the conditional distribution of response by gender for the data from the previous example, reproduced below. Source: Based on a Fox News Poll conducted in January, 1999 Parallel Example 3: Constructing a Conditional Distribution and Bar Graph

© 2010 Pearson Prentice Hall. All rights reserved 12-172 We first compute the conditional distribution of response by gender. Solution MoneyHealthLove Men82/883 ≈ 0.0929 446/883 ≈ 0.5051 355/883 ≈ 0.4020 Women46/893 ≈ 0.0515 574/893 ≈ 0.6428 273/893 ≈ 0.3057

© 2010 Pearson Prentice Hall. All rights reserved 12-175 In a chi-square test for homogeneity of proportions, we test whether different populations have the same proportion of individuals with some characteristic.

© 2010 Pearson Prentice Hall. All rights reserved 12-177 The following question was asked of a random sample of individuals in 1992, 2002, and 2008: “Would you tell me if you feel being a teacher is an occupation of very great prestige?” The results of the survey are presented below: Test the claim that the proportion of individuals that feel being a teacher is an occupation of very great prestige is the same for each year at the  = 0.01 level of significance. Source: The Harris Poll Parallel Example 5: A Test for Homogeneity of Proportions 199220022008 Yes418479525 No602541485

© 2010 Pearson Prentice Hall. All rights reserved 12-178 Step 1: The null hypothesis is a statement of “no difference” so the proportions for each year who feel that being a teacher is an occupation of very great prestige are equal. We state the hypotheses as follows: H 0 : p 1 = p 2 = p 3 H 1 : At least one of the proportions is different from the others. Step 2: The level of significance is  =0.01. Solution

© 2010 Pearson Prentice Hall. All rights reserved 12-179 Step 3: (a) The expected frequencies are found by multiplying the appropriate row and column totals and then dividing by the total sample size. They are given in parentheses in the table below, along with the observed frequencies. Solution 199220022008 Yes 418 (475.554) 479 (475.554) 525 (470.892) No 602 (544.446) 541 (544.446) 485 (539.108)

© 2010 Pearson Prentice Hall. All rights reserved 12-181 Step 4: There are r = 2 rows and c =3 columns, so we find the critical value using (2-1)(3-1) = 2 degrees of freedom. The critical value is. Solution: Classical Approach

© 2010 Pearson Prentice Hall. All rights reserved 12-183 Step 4: There are r = 2 rows and c =3 columns so we find the P-value using (2-1)(3-1) = 2 degrees of freedom. The P-value is the area under the chi-square distribution with 2 degrees of freedom to the right of which is approximately 0. Solution: P-Value Approach

© 2010 Pearson Prentice Hall. All rights reserved 12-185 Step 6: There is sufficient evidence to reject the null hypothesis at the  = 0.01 level of significance. We conclude that the proportion of individuals who believe that teaching is a very prestigious career is different for at least one of the three years. Solution

© 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12.

Similar presentations

Presentation on theme: "© 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12.

Similar presentations

Presentation on theme: "© 2010 Pearson Prentice Hall. All rights reserved Chapter Inference on Categorical Data 12."— Presentation transcript:

Similar presentations

About project

Feedback