12-2 The chi-square test for independence is used to determine whether there is an association between a row variable and column variable in a contingency table constructed from sample data taken from a population of interest. The null hypothesis is that the variables are not associated; in other words, they are independent. The alternative hypothesis is that the variables are associated, or dependent.
12-3 “In Other Words” In a chi-square independence test, the null hypothesis is always H 0 : The variables are independent The alternative hypothesis is always H 0 : The variables are not independent
12-4 The idea behind testing these types of claims is to compare actual counts to the counts we would expect if the null hypothesis were true (if the variables are independent). If a significant difference between the actual counts and expected counts exists, we would take this as evidence against the null hypothesis.
12-5 If two events are independent, then P(A and B) = P(A)P(B) We can use the Multiplication Principle for independent events to obtain the expected proportion of observations within each cell under the assumption of independence and multiply this result by n, the sample size, in order to obtain the expected count within each cell.
12-6 In a poll, 883 males and 893 females were asked “If you could have only one of the following, which would you pick: money, health, or love?” Their responses are presented in the table below. Determine the expected counts within each cell assuming that gender and response are independent. Source: Based on a Fox News Poll conducted in January, 1999 Parallel Example 1: Determining the Expected Counts in a Test for Independence
12-7 Step 1: We first compute the row and column totals: Solution MoneyHealthLoveRow Totals Men82446355883 Women46574273893 Column totals12810206281776
12-8 Step 2: Next compute the relative marginal frequencies for the row variable and column variable: Solution MoneyHealthLoveRelative Frequency Men82446355883/1776 ≈ 0.4972 Women46574273893/1776 ≈0.5028 Relative Frequency 128/1776 ≈0.0721 1020/1776 ≈0.5743 628/1776 ≈0.35361
12-9 Step 3: Assuming gender and response are independent, we use the Multiplication Rule for Independent Events to compute the proportion of observations we would expect in each cell. Solution MoneyHealthLove Men0.03580.28550.1758 Women0.03620.28880.1778
12-10 Step 4: We multiply the expected proportions from step 3 by 1776, the sample size, to obtain the expected counts under the assumption of independence. Solution MoneyHealthLove Men1776(0.0358) ≈ 63.5808 1776(0.2855) ≈ 507.048 1776(0.1758) ≈ 312.2208 Wome n 1776(0.0362) ≈ 64.2912 1776(0.2888) ≈ 512.9088 1776(0.1778) ≈ 315.7728
12-11 Expected Frequencies in a Chi-Square Test for Independence To find the expected frequencies in a cell when performing a chi-square independence test, multiply the row total of the row containing the cell by the column total of the column containing the cell and divide this result by the table total. That is,
12-12 Test Statistic for the Test of Independence Let O i represent the observed number of counts in the ith cell and E i represent the expected number of counts in the ith cell. Then approximately follows the chi-square distribution with (r-1)(c-1) degrees of freedom, where r is the number of rows and c is the number of columns in the contingency table, provided that (1) all expected frequencies are greater than or equal to 1 and (2) no more than 20% of the expected frequencies are less than 5.
12-13 Step 1: Determine the null and alternative hypotheses. H 0 : The row variable and column variable are independent. H 1 : The row variable and column variables are dependent. Chi-Square Test for Independence To test the association (or independence of) two variables in a contingency table:
12-14 Step 2: Choose a level of significance, , depending on the seriousness of making a Type I error.
12-15 Step 3: a)Calculate the expected frequencies (counts) for each cell in the contingency table. b)Verify that the requirements for the chi- square test for independence are satisfied: 1.All expected frequencies are greater than or equal to 1 (all E i ≥ 1). 2.No more than 20% of the expected frequencies are less than 5.
12-16 Step 3: c) Compute the test statistic: Note: O i is the observed count for the ith category.
12-17 Step 4: Use Table VII to determine an approximate P- value by determining the area under the chi- square distribution with (r-1)(c-1) degrees of freedom to the right of the test statistic. P-Value Approach
12-18 Step 5: If the P-value < , reject the null hypothesis. If the P-value ≥ α, fail to reject the null hypothesis. P-Value Approach
12-19 Step 6: State the conclusion in the context of the problem.
12-20 In a poll, 883 males and 893 females were asked “If you could have only one of the following, which would you pick: money, health, or love?” Their responses are presented in the table below. Test the claim that gender and response are independent at the = 0.05 level of significance. Source: Based on a Fox News Poll conducted in January, 1999 Parallel Example 2: Performing a Chi-Square Test for Independence
12-21 Step 1: We want to know whether gender and response are dependent or independent so the hypotheses are: H 0 : gender and response are independent H 1 : gender and response are dependent Step 2: The level of significance is =0.05. Solution
12-22 Step 3: (a) The expected frequencies were computed in Example 1 and are given in parentheses in the table below, along with the observed frequencies. Solution MoneyHealthLove Men82 (63.5808) 446 (507.048) 355 (312.2208) Women46 (64.2912) 574 (512.9088) 273 (315.7728)
12-23 Step 3: (b)Since none of the expected frequencies are less than 5, the requirements for the goodness-of-fit test are satisfied. (c)The test statistic is Solution
12-24 Step 4: There are r = 2 rows and c =3 columns so we find the P-value using (2-1)(3-1) = 2 degrees of freedom. The P-value is the area under the chi-square distribution with 2 degrees of freedom to the right of which is approximately 0. Solution: P-Value Approach
12-25 Step 5: Since the P-value is less than the level of significance = 0.05, we reject the null hypothesis. Solution: P-Value Approach
12-26 Step 6: There is sufficient evidence to conclude that gender and response are dependent at the = 0.05 level of significance. Solution
12-27 To see the relation between response and gender, we draw bar graphs of the conditional distributions of response by gender. Recall that a conditional distribution lists the relative frequency of each category of a variable, given a specific value of the other variable in a contingency table.
12-28 Find the conditional distribution of response by gender for the data from the previous example, reproduced below. Source: Based on a Fox News Poll conducted in January, 1999 Parallel Example 3: Constructing a Conditional Distribution and Bar Graph
12-29 We first compute the conditional distribution of response by gender. Solution MoneyHealthLove Men82/883 ≈ 0.0929 446/883 ≈ 0.5051 355/883 ≈ 0.4020 Women46/893 ≈ 0.0515 574/893 ≈ 0.6428 273/893 ≈ 0.3057