## Presentation on theme: "© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Goodness-of-Fit Test."— Presentation transcript:

12-2 Characteristics of the Chi-Square Distribution 1.It is not symmetric.

12-3 1.It is not symmetric. 2.The shape of the chi-square distribution depends on the degrees of freedom, just like Student’s t-distribution. Characteristics of the Chi-Square Distribution

12-4 1.It is not symmetric. 2.The shape of the chi-square distribution depends on the degrees of freedom, just like Student’s t-distribution. 3.As the number of degrees of freedom increases, the chi-square distribution becomes more nearly symmetric. Characteristics of the Chi-Square Distribution

12-5 1.It is not symmetric. 2.The shape of the chi-square distribution depends on the degrees of freedom, just like Student’s t-distribution. 3.As the number of degrees of freedom increases, the chi-square distribution becomes more nearly symmetric. 4.The values of  2 are nonnegative, i.e., the values of  2 are greater than or equal to 0. Characteristics of the Chi-Square Distribution

12-6

12-7 A goodness-of-fit test is an inferential procedure used to determine whether a frequency distribution follows a specific distribution.

12-8 Expected Counts Suppose that there are n independent trials of an experiment with k ≥ 3 mutually exclusive possible outcomes. Let p 1 represent the probability of observing the first outcome and E 1 represent the expected count of the first outcome; p 2 represent the probability of observing the second outcome and E 2 represent the expected count of the second outcome; and so on. The expected counts for each possible outcome are given by E i =  i = np i for i = 1, 2, …, k

12-9 A sociologist wishes to determine whether the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000. According to the United States Census Bureau, in 2000, 22.8% of grandparents have been responsible for their grandchildren less than 1 year; 23.9% of grandparents have been responsible for their grandchildren for 1 or 2 years; 17.6% of grandparents have been responsible for their grandchildren 3 or 4 years; and 35.7% of grandparents have been responsible for their grandchildren for 5 or more years. If the sociologist randomly selects 1,000 care-giving grandparents, compute the expected number within each category assuming the distribution has not changed from 2000. Parallel Example 1: Finding Expected Counts

12-10 Step 1: The probabilities are the relative frequencies from the 2000 distribution: p <1yr = 0.228 p 1-2yr = 0.239 p 3-4yr = 0.176 p ≥5yr = 0.357 Solution

12-11 Step 2: There are n=1,000 trials of the experiment so the expected counts are: E <1yr = np <1yr = 1000(0.228) = 228 E 1-2yr = np 1-2yr = 1000(0.239) = 239 E 3-4yr = np 3-4yr =1000(0.176) = 176 E ≥5yr = np ≥5yr = 1000(0.357) = 357 Solution

12-12 Test Statistic for Goodness-of-Fit Tests Let O i represent the observed counts of category i, E i represent the expected counts of category i, k represent the number of categories, and n represent the number of independent trials of an experiment. Then the formula approximately follows the chi-square distribution with k-1 degrees of freedom, provided that 1.all expected frequencies are greater than or equal to 1 (all E i ≥ 1) and 2.no more than 20% of the expected frequencies are less than 5.

12-13 Step 1: Determine the null and alternative hypotheses. H 0 : The random variable follows a certain distribution H 1 : The random variable does not follow a certain distribution The Goodness-of-Fit Test To test the hypotheses regarding a distribution, we use the steps that follow.

12-14 Typically the hypotheses can be symbolically represented as: The Goodness-of-Fit Test (hypotheses cont.) vs. the alternative:

12-15 Step 2: Decide on a level of significance, , depending on the seriousness of making a Type I error.

12-16 Step 3: a)Calculate the expected counts for each of the k categories. The expected counts are E i =np i for i = 1, 2, …, k where n is the number of trials and p i is the probability of the ith category, assuming that the null hypothesis is true.

12-17 Step 3: b)Verify that the requirements for the goodness- of-fit test are satisfied. 1.All expected counts are greater than or equal to 1 (all E i ≥ 1). 2.No more than 20% of the expected counts are less than 5. c) Compute the test statistic: Note: O i is the observed count for the ith category.

12-18 CAUTION! If the requirements in Step 3(b) are not satisfied, one option is to combine two or more of the low- frequency categories into a single category.

12-19 Step 4: Use Table VII to obtain an approximate P-value by determining the area under the chi-square distribution with k-1 degrees of freedom to the right of the test statistic. P-Value Approach

12-20 Step 5: If the P-value < , reject the null hypothesis. If the P-value ≥ α, fail to reject the null hypothesis. P-Value Approach

12-21 Step 6: State the conclusion in the context of the problem. Note: in many cases, when the null hypothesis is rejected at the conclusion of the test, we will have to attempt to explain what the non- conformity was.

12-22 A sociologist wishes to determine whether the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000. According to the United States Census Bureau, in 2000, 22.8% of grandparents have been responsible for their grandchildren less than 1 year; 23.9% of grandparents have been responsible for their grandchildren for 1 or 2 years; 17.6% of grandparents have been responsible for their grandchildren 3 or 4 years; and 35.7% of grandparents have been responsible for their grandchildren for 5 or more years. The sociologist randomly selects 1,000 care-giving grandparents and obtains the following data. Parallel Example 2: Conducting a Goodness-of -Fit Test

12-23 Test the claim that the distribution is different today than it was in 2000 at the  = 0.05 level of significance.

12-24 Step 1: We want to know if the distribution today is different than it was in 2000. The hypotheses are then: H 0 : The distribution for the number of years care-giving grandparents are responsible for their grandchildren is the same today as it was in 2000 H 1 : The distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000 Solution

12-25 Or alternatively:

12-26 Step 2: The level of significance is  =0.05. Step 3: (a) The expected counts were computed in Example 1. Solution Number of Years Observed Counts Expected Counts <1252228 1-2255239 3-4162176 ≥5331357

12-27 Step 3: (b)Since all expected counts are greater than or equal to 5, the requirements for the goodness-of-fit test are satisfied. (c)The test statistic is Solution

12-28 Step 4: There are k = 4 categories. The P-value is the area under the chi-square distribution with 4-1=3 degrees of freedom to the right of. Thus, P-value ≈ 0.09. Solution: P-Value Approach

12-29 Step 5: Since the P-value ≈ 0.09 is greater than the level of significance  = 0.05, we fail to reject the null hypothesis. Solution: P-Value Approach

12-30 Step 6: There is insufficient evidence to conclude that the distribution for the number of years care-giving grandparents are responsible for their grandchildren is different today than it was in 2000 at the  = 0.05 level of significance. Solution