Chapter 11 Chi-Square Procedures 11.1 Chi-Square Goodness of Fit.

Presentation on theme: "Chapter 11 Chi-Square Procedures 11.1 Chi-Square Goodness of Fit."— Presentation transcript:

Chapter 11 Chi-Square Procedures 11.1 Chi-Square Goodness of Fit

Characteristics of the Chi-Square Distribution 1. It is not symmetric.

Characteristics of the Chi-Square Distribution 1. It is not symmetric. 2. The shape of the chi-square distribution depends upon the degrees of freedom, just like Student’s t-distribution.

Characteristics of the Chi-Square Distribution 1. It is not symmetric. 2. The shape of the chi-square distribution depends upon the degrees of freedom, just like Student’s t-distribution. 3. As the number of degrees of freedom increases, the chi-square distribution becomes more symmetric as is illustrated in Figure 1.

Characteristics of the Chi-Square Distribution 1. It is not symmetric. 2. The shape of the chi-square distribution depends upon the degrees of freedom, just like Student’s t-distribution. 3. As the number of degrees of freedom increases, the chi-square distribution becomes more symmetric as is illustrated in Figure 1. 4. The values are non-negative. That is, the values of are greater than or equal to 0.

The Chi-Square Distribution

A goodness-of-fit test is an inferential procedure used to determine whether a frequency distribution follows a claimed distribution.

Expected Counts Suppose there are n independent trials an experiment with k > 3 mutually exclusive possible outcomes. Let p 1 represent the probability of observing the first outcome and E 1 represent the expected count of the first outcome, p 2 represent the probability of observing the second outcome and E 2 represent the expected count of the second outcome, and so on. The expected counts for each possible outcome is given by E i =  i = np i for i = 1, 2, …, k

EXAMPLEFinding Expected Counts A sociologist wishes to determine whether the distribution for the number of years grandparents who are responsible for their grandchildren is different today than it was in 2000. According to the United States Census Bureau, in 2000, 22.8% of grandparents have been responsible for their grandchildren less than 1 year; 23.9% of grandparents have been responsible for their grandchildren 1or 2 years; 17.6% of grandparents have been responsible for their grandchildren 3 or 4 years; and 35.7% of grandparents have been responsible for their grandchildren for 5 or more years. If the sociologist randomly selects 1,000 grandparents that are responsible for their grandchildren, compute the expected number within each category assuming the distribution has not changed from 2000.

Test Statistic for Goodness-of-Fit Tests Let O i represent the observed counts of category i, E i represent the expected counts of an category i, k represent the number of categories, and n represent the number of independent trials of an experiment. Then, approximately follows the chi-square distribution with k – 1 degrees of freedom provided (1) all expected frequencies are greater than or equal to 1 (all E i > 1) and (2) no more than 20% of the expected frequencies are less than 5. NOTE: E i = np i for i = 1,2,..., k. i = 1, 2, …, k

The Chi-Square Goodness-of-Fit Test If a claim is made regarding a distribution, we can use the following steps to test the claim provided 1.the data is randomly selected

The Chi-Square Goodness-of-Fit Test If a claim is made regarding a distribution, we can use the following steps to test the claim provided 1.the data is randomly selected 2. all expected frequencies are greater than or equal to 1.

The Chi-Square Goodness-of-Fit Test If a claim is made regarding a distribution, we can use the following steps to test the claim provided 1.the data is randomly selected 2. all expected frequencies are greater than or equal to 1. 3. no more than 20% of the expected frequencies are less than 5.

Step 1: A claim is made regarding a distribution. The claim is used to determine the null and alternative hypothesis. H o : the random variable follows the claimed distribution H 1 : the random variable does not follow the claimed distribution

Step 2: Calculate the expected frequencies for each of the k categories. The expected frequencies are np i for i = 1, 2, …, k where n is the number of trials and p i is the probability of the ith category assuming the null hypothesis is true.

Step 3: Verify the requirements fort he goodness-of-fit test are satisfied. (1) all expected frequencies are greater than or equal to 1 (all E i > 1) (2) no more than 20% of the expected frequencies are less than 5.

EXAMPLETesting a Claim Using the Goodness-of-Fit Test A sociologist wishes to determine whether the distribution for the number of years grandparents who are responsible for their grandchildren is different today than it was in 2000. According to the United States Census Bureau, in 2000, 22.8% of grandparents have been responsible for their grandchildren less than 1 year; 23.9% of grandparents have been responsible for their grandchildren 1or 2 years; 17.6% of grandparents have been responsible for their grandchildren 3 or 4 years; and 35.7% of grandparents have been responsible for their grandchildren for 5 or more years. The sociologist randomly selects 1,000 grandparents that are responsible for their grandchildren and obtains the following data.

Solution: Step 1. Construct the Hypothesis H 0 : The distribution for the number of years grandparents who are responsible for their grandchildren is the same today as it was in 2000. H 1 : The distribution for the number of years grandparents who are responsible for their grandchildren is different today from what it was in 2000.

Step 2. Compute the expected counts for each category, assuming that the null hypothesis is true. Number of YearsFrequency(O i ) (observed count) Expected Frequency(E i ) (expected count) Less than 1 year252228 1 or 2 years255239 3 or 4 years162176 5 or more years331357

Solution(cont’d): Step 3. Verify that the requirements for the goodness-of-fit test are satisfied. 1. All expected frequencies( or expected counts ) are bigger than or equal to 1? 2. No more than 20% of the expected frequencies are less than 5.

Step 4. Find the critical values, determine the critical region. α=0.05, k = 4, degree of freedom = k-1 =3 Look in table IV, χ α 2 =7.815 C:=(7.815, infinity)

Step 5. Compute the test statistic χ 2 = (252-228)^2/228+(255-239)^2/239 +(162-176)^2/176+(331-357)^2/357 =6.605 Step 6. Compare the test statistics with the critical values the test statistic < the critical value or the test statistic does not lie in th critical region. Step 7. Conclusion? There is no sufficient evidence at the α=0.05 level of significance to reject the null hypothesis, i.e., the claim of the distribution for the number of years grandparents who are responsible for their grandchildren is the same today as it was in 2000 Or ….