Chapter 11 Inference for Tables: Chi-Square Procedures 11.1 Target Goal:I can compute expected counts, conditional distributions, and contributions to.

Presentation on theme: "Chapter 11 Inference for Tables: Chi-Square Procedures 11.1 Target Goal:I can compute expected counts, conditional distributions, and contributions to."— Presentation transcript:

Chapter 11 Inference for Tables: Chi-Square Procedures 11.1 Target Goal:I can compute expected counts, conditional distributions, and contributions to the chi-square statistic. h.w: pg. 621: 1, 3, 5, 9, 11

Test for Goodness of Fit To analyze categorical data, we construct two-way tables and examine the counts or percents of the explanatory and response variables. To analyze categorical data, we construct two-way tables and examine the counts or percents of the explanatory and response variables. Count and record M&M colors per bag. Count and record M&M colors per bag. Expected count: Expected count:

M&Ms Color Distribution % according to their website BrownYellowRedBlueOrangeGreen Plain 131413242016 Peanut 12151223 15 Peanut Butter/ Almond 10201020

We want to compare the observed counts to the expected counts. We want to compare the observed counts to the expected counts. The null hypothesis is that there is no difference between the observed and expected counts. The null hypothesis is that there is no difference between the observed and expected counts. The alternative hypothesis is that there is a difference between the observed and expected counts The alternative hypothesis is that there is a difference between the observed and expected counts

Simulate count of M&M’s bag or use own M&M’s bag Label: Label: 1-13Brown 1-13Brown 14-27Yellow 14-27Yellow 28-40Red 28-40Red 41-64 Blue 41-64 Blue 65-84 Orange 65-84 Orange 85-00 Green 85-00 Green Math:Prb:Randint(0,99,50) sto in L1 Math:Prb:Randint(0,99,50) sto in L1 Sort in ascending and tally. Sort in ascending and tally.

Chi-square statistic It measures how well the observed counts fit the expected counts, assuming that the null hypothesis is true. It measures how well the observed counts fit the expected counts, assuming that the null hypothesis is true. Go to Blank student notes.

The distribution of the chi-square statistic is called the chi-square distribution, X 2. This distribution is a density curve. The total area under the curve is 1. The total area under the curve is 1. The curve begins at zero on the horizontal axis and is skewed right. The curve begins at zero on the horizontal axis and is skewed right. As the degrees of freedom increase, the shape of the curve becomes more symmetric. As the degrees of freedom increase, the shape of the curve becomes more symmetric.

Pg. 703

“Goodness of Fit Test.” Using the M&M Minis® chi-square statistic, find the probability of obtaining a X 2 value at least this extreme assuming the null hypothesis is true. Using the M&M Minis® chi-square statistic, find the probability of obtaining a X 2 value at least this extreme assuming the null hypothesis is true. Use your Chi-square statistic and df = 6-1 = 5 Use your Chi-square statistic and df = 6-1 = 5 P-value = X 2 cdf(lb,up,df) P-value = X 2 cdf(lb,up,df)

CONDITIONS for Individual Expected Counts: The Goodness of Fit Test may be used when all expected counts are at least 1 and no more than 20% of the expected counts are less than 5. The Goodness of Fit Test may be used when all expected counts are at least 1 and no more than 20% of the expected counts are less than 5. Following the Goodness of Fit Test, check to see which component made the greatest contribution to the chi-square statistic to see where the biggest changes occurred. Following the Goodness of Fit Test, check to see which component made the greatest contribution to the chi-square statistic to see where the biggest changes occurred.

Conditions for Chi-Square Test Random: The data come from a random sample or a randomized experiment. Random: The data come from a random sample or a randomized experiment. Large sample size: All expected counts are at least 5. Large sample size: All expected counts are at least 5. Independent: Individual observations are independent. When sampling without replacement, check the 10% condition. Independent: Individual observations are independent. When sampling without replacement, check the 10% condition.

Ex: The Graying of America It is believed that with better medicine and healthier lifestyles, people are living longer and consequently a larger percentage of the population is of retirement age. Compare distribution of 1980 population to 1996 population. It is believed that with better medicine and healthier lifestyles, people are living longer and consequently a larger percentage of the population is of retirement age. Compare distribution of 1980 population to 1996 population.

Step 1: State - Identify the population of interest and the parameter you want to draw a conclusion about. State the hypothesis in words and symbols. We want determine if the distribution of age groups in the United States in 1996 has changed significantly from the 1980 distribution. We want determine if the distribution of age groups in the United States in 1996 has changed significantly from the 1980 distribution. H o : the age group dist. in 1996 is the same as the 1980 dist. H o : the age group dist. in 1996 is the same as the 1980 dist. H a : the age group dist. in 1996 is different from the 1980 dist. H a : the age group dist. in 1996 is different from the 1980 dist.

Or, State the hypothesis as proportions. H o : p 0-24 = 0.4139, p 25-44 = 0.2768, p 45-64 = 0.1964, p 65+ = 0.1128. H o : p 0-24 = 0.4139, p 25-44 = 0.2768, p 45-64 = 0.1964, p 65+ = 0.1128. H a : at least one of the proportions differs from the stated values. H a : at least one of the proportions differs from the stated values.

Goal of “Goodness of Fit Tests” The more the observed counts differ from the expected counts, the more the evidence we have to reject H o and thus conclude that the population dist. in 1996 is significantly different from 1980. The more the observed counts differ from the expected counts, the more the evidence we have to reject H o and thus conclude that the population dist. in 1996 is significantly different from 1980.

Always a good idea to plot the data.

Step 2: Plan - Choose the appropriate inference procedure. Verify the conditions for using the selected procedure. If the conditions are met, conduct a chi- square goodness of fit test. Random: We must assume the two distributions of age groups come from a randomized experiment. Random: We must assume the two distributions of age groups come from a randomized experiment.

Calculate expected counts in each age category and verify that they are large enough (see conditions). Calculate expected counts in each age category and verify that they are large enough (see conditions). Yes, all > 5; Proceed with Chi – square calculations

Independent: Independent: We clearly have two independent age groups, one from 1980 and one from 1996. We must check the 10% condition. We clearly have two independent age groups, one from 1980 and one from 1996. We must check the 10% condition. There are at least 10(286,598) U.S citizens in 1980 and at least 10(500) U.S citizens in 1996. There are at least 10(286,598) U.S citizens in 1980 and at least 10(500) U.S citizens in 1996.

Step 3: Do - If the conditions are met, carry out the inference procedure. Calculate the x 2 statistic to measure how well the observed counts (O) differ form the expected counts (E) under H o. Calculate the x 2 statistic to measure how well the observed counts (O) differ form the expected counts (E) under H o.

A large value of x 2 shows more evidence against H o and also results in a small P- value.

Calculate P-value df: use n-1 degrees of freedom. df: use n-1 degrees of freedom. This is because X 2 the family of curves is used to assess evidence against H o. This is because X 2 the family of curves is used to assess evidence against H o. Since we are using percentages, 3 of the 4 percentages are allowed to vary, the 4th is not. Since we are using percentages, 3 of the 4 percentages are allowed to vary, the 4th is not. Df = 4-1 = 3, Df = 4-1 = 3,

Table C for a P-value of 0.05, critical value is 7.81. Table C for a P-value of 0.05, critical value is 7.81. Calc: 2nd VARS: X 2 cdf(8.2275,E99,3) Calc: 2nd VARS: X 2 cdf(8.2275,E99,3).0415

Step 4. Conclude - Interpret the results in the context of the problem. Since our value of 8.2275 is more extreme than 7.81, we reject H o and conclude that the population dist. in 1996 is significantly different from the 1980 dist. at the 5% level. Since our value of 8.2275 is more extreme than 7.81, we reject H o and conclude that the population dist. in 1996 is significantly different from the 1980 dist. at the 5% level.

To be cont.

Download ppt "Chapter 11 Inference for Tables: Chi-Square Procedures 11.1 Target Goal:I can compute expected counts, conditional distributions, and contributions to."

Similar presentations