Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 11 Inference for Distributions of Categorical Data

Similar presentations


Presentation on theme: "Chapter 11 Inference for Distributions of Categorical Data"— Presentation transcript:

1 Chapter 11 Inference for Distributions of Categorical Data
Section 11.2 Inference for Two-Way Tables

2 Inference for Two-Way Tables
STATE appropriate hypotheses and COMPUTE the expected counts and chi-square test statistic for a chi-square test based on data in a two- way table. STATE and CHECK the Random, 10%, and Large Counts conditions for a chi-square test based on data in a two-way table. Calculate the degrees of freedom and P-value for a chi-square test based on data in a two-way table. PERFORM a chi-square test for homogeneity. PERFORM a chi-square test for independence. CHOOSE the appropriate chi-square test in a given setting.

3 Tests for Homogeneity: Stating Hypotheses
Does background music influence what customers buy? One experiment in a European restaurant compared three randomly assigned treatments: no music, French accordion music, and Italian string music. Under each condition, the researchers recorded the number of customers who ordered French, Italian, and other entrées.

4 Tests for Homogeneity: Stating Hypotheses
Does background music influence what customers buy? One experiment in a European restaurant compared three randomly assigned treatments: no music, French accordion music, and Italian string music. Under each condition, the researchers recorded the number of customers who ordered French, Italian, and other entrées. H0: There is no difference in the true distributions of entrées ordered at this restaurant when no music, French accordion music, or Italian string music is played.

5 Tests for Homogeneity: Stating Hypotheses
Does background music influence what customers buy? One experiment in a European restaurant compared three randomly assigned treatments: no music, French accordion music, and Italian string music. Under each condition, the researchers recorded the number of customers who ordered French, Italian, and other entrées. H0: There is no difference in the true distributions of entrées ordered at this restaurant when no music, French accordion music, or Italian string music is played. Ha: There is a difference in the true distributions of entrées ordered at this restaurant when no music, French accordion music, or Italian string music is played.

6 Stating Hypotheses for GOF (11.1)
Ramón Rivera Moret H0: The distribution of color in the large bag of M&M’S Milk Chocolate Candies is the same as the claimed distribution. Ha: The distribution of color in the large bag of M&M’S Milk Chocolate Candies is not the same as the claimed distribution. With GOF (11.1) we say the same in Ho. With Homogeneity (11.2) we say there is no diff. in Ho.

7 Tests for Homogeneity: Stating Hypotheses
Does background music influence what customers buy? One experiment in a European restaurant compared three randomly assigned treatments: no music, French accordion music, and Italian string music. Under each condition, the researchers recorded the number of customers who ordered French, Italian, and other entrées. The alternative hypothesis does not state that each distribution is different from each of the others. CAUTION: H0: There is no difference in the true distributions of entrées ordered at this restaurant when no music, French accordion music, or Italian string music is played. Ha: There is a difference in the true distributions of entrées ordered at this restaurant when no music, French accordion music, or Italian string music is played.

8 Tests for Homogeneity: Stating Hypotheses

9 Tests for Homogeneity: Stating Hypotheses

10 Tests for Homogeneity: Stating Hypotheses
Do the differences in these distributions provide convincing evidence that background music affects customer behavior at this restaurant? Or is it plausible that the background music has no effect on customer behavior and that these differences are due to the chance involved in the random assignment of treatments?

11 Tests for Homogeneity: Stating Hypotheses
We could compare many pairs of proportions, ending up with many tests and many P-values. This is a bad idea.

12 Tests for Homogeneity: Stating Hypotheses
We could compare many pairs of proportions, ending up with many tests and many P-values. This is a bad idea. CAUTION: Performing multiple tests on the same data increases the probability that we make a Type I error in at least one of the tests.

13 Tests for Homogeneity: Stating Hypotheses
The problem of how to do many comparisons at once without increasing the overall probability of a Type I error is common in statistics. Statistical methods for dealing with multiple comparisons usually have two steps:

14 Tests for Homogeneity: Stating Hypotheses
The problem of how to do many comparisons at once without increasing the overall probability of a Type I error is common in statistics. Statistical methods for dealing with multiple comparisons usually have two steps: Perform an overall test to see if there is convincing evidence of any differences among the parameters that we want to compare.

15 Tests for Homogeneity: Stating Hypotheses
The problem of how to do many comparisons at once without increasing the overall probability of a Type I error is common in statistics. Statistical methods for dealing with multiple comparisons usually have two steps: Perform an overall test to see if there is convincing evidence of any differences among the parameters that we want to compare. When the overall test shows there is convincing evidence of a difference, perform a detailed follow-up analysis to decide which of the parameters differ and to estimate how large the differences are.

16 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
To find the expected counts, we start by assuming that H0 is true.

17 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
To find the expected counts, we start by assuming that H0 is true. If the specific type of music that’s playing has no effect on entrée orders, the proportion of French entrées ordered under each music condition should be 99/243 =

18 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
To find the expected counts, we start by assuming that H0 is true. If the specific type of music that’s playing has no effect on entrée orders, the proportion of French entrées ordered under each music condition should be 99/243 = Because there were 84 total entrées ordered when no music was playing, we would expect 84(99/243) = 841(0.4074) = of those entrées to be French, on average.

19 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
Calculating Expected Counts for a Chi-Square Test Based on Data in a Two-Way Table When H0 is true, the expected count in any cell of a two-way table is expected count= row total∙column total table total

20 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
Calculating Expected Counts for a Chi-Square Test Based on Data in a Two-Way Table When H0 is true, the expected count in any cell of a two-way table is expected count= row total∙column total table total

21 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
Calculating Expected Counts for a Chi-Square Test Based on Data in a Two-Way Table When H0 is true, the expected count in any cell of a two-way table is expected count= row total∙column total table total

22 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
AP® Exam Tip As with chi-square tests for goodness of fit, the expected counts should not be rounded to the nearest whole number. While an observed count of entrées ordered must be a whole number, an expected count need not be a whole number. The expected count gives the average number of entrées ordered if H0 is true and the random assignment process is repeated many times.

23 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
𝜒 2 = (𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 −𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡) 2 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡

24 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
𝜒 2 = (𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 −𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡) 2 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 The sum is over all cells (not including the totals!) in the two-way table.

25 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
𝜒 2 = (𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 −𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡) 2 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 The sum is over all cells (not including the totals!) in the two-way table. 𝜒 2 = (30−34.22) (39−30.56) ⋯+ (35−39.06)

26 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
𝜒 2 = (𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 −𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡) 2 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 The sum is over all cells (not including the totals!) in the two-way table. 𝜒 2 = (30−34.22) (39−30.56) ⋯+ (35−39.06) = ⋯+0.42 =18.28

27 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
𝜒 2 = (𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 −𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡) 2 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 The sum is over all cells (not including the totals!) in the two-way table. 𝜒 2 = (30−34.22) (39−30.56) ⋯+ (35−39.06) = ⋯+0.42 =18.28 AP® Exam Tip In the “Do” step, you aren’t required to show every term in the chi-square test statistic. Writing the first few terms of the sum followed by “. . .” is considered as “showing work.” We suggest that you do this and then let your calculator tackle the computations.

28 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
Problem: For a class project, Abby and Mia wanted to know if the gender of an interviewer could affect the responses to a survey question. The subjects in their experiment were 100 males from their school. Half of the males were randomly assigned to be asked, “Would you vote for a female president?” by a female interviewer. The other half of the males were asked the same question by a male interviewer. The table shows the results. (a) State the appropriate null and alternative hypotheses. (b) Show the calculation for the expected count in the Male/Yes cell. Then provide a complete table of expected counts. (c) Calculate the value of the chi-square test statistic.

29 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
Problem: The table shows the results. (a) State the appropriate null and alternative hypotheses. (b) Show the calculation for the expected count in the Male/Yes cell. Then provide a complete table of expected counts. (c) Calculate the value of the chi-square test statistic.

30 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
Problem: The table shows the results. (a) State the appropriate null and alternative hypotheses. (b) Show the calculation for the expected count in the Male/Yes cell. Then provide a complete table of expected counts. (c) Calculate the value of the chi-square test statistic. (a) H0: There is no difference in the true distributions of response to this question when asked by a male interviewer and when asked by a female interviewer for subjects like these. Ha: There is a difference in the true distributions of response to this question when asked by a male interviewer and when asked by a female interviewer for subjects like these.

31 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
Problem: The table shows the results. (a) State the appropriate null and alternative hypotheses. (b) Show the calculation for the expected count in the Male/Yes cell. Then provide a complete table of expected counts. (c) Calculate the value of the chi-square test statistic. (b) The expected count for the Male/Yes cell is (69 · 50)/100 = The rest of the expected counts are shown in the table.

32 Tests for Homogeneity: Expected Counts and the Chi-Square Test Statistic
Problem: The table shows the results. (a) State the appropriate null and alternative hypotheses. (b) Show the calculation for the expected count in the Male/Yes cell. Then provide a complete table of expected counts. (c) Calculate the value of the chi-square test statistic. (c) 𝜒 2 = (30−34.5) (39−34.5) ⋯+ (8−10) 2 10 =4.25

33 Tests for Homogeneity: Conditions and P-values
Conditions for Performing a Chi-Square Test for Homogeneity • Random: The data come from independent random samples or from groups in a randomized experiment. ◦ 10%: When sampling without replacement, n < 0.10N for each sample. • Large Counts: All expected counts are at least 5.

34 Tests for Homogeneity: Conditions and P-values
Problem: In the preceding example, you read about an experiment to determine if the gender of an interviewer affects responses to the question “Would you vote for a female president?” Here are tables showing the observed and expected counts: Earlier, we calculated 2 = 4.25. (a) Verify that the conditions for inference are met. (b) Use Table C to find the P-value. Then use your calculator’s 2 cdf command. (c) Interpret the P-value from the calculator. (d) What conclusion would you draw?

35 Tests for Homogeneity: Conditions and P-values
Problem: Earlier, we calculated 2 = 4.25. (a) Verify that the conditions for inference are met.

36 Tests for Homogeneity: Conditions and P-values
Problem: Earlier, we calculated 2 = 4.25. (a) Verify that the conditions for inference are met. (a) Random: Treatments were randomly assigned. ✓ Large Counts: All expected counts ≥ 5 (see table of expected counts). ✓ Note: We don’t check the 10% condition because Abby and Mia didn’t randomly select subjects from some population.

37 Tests for Homogeneity: Conditions and P-values
Problem: Earlier, we calculated 2 = 4.25. (b) Use Table C to find the P-value. Then use your calculator’s 2 cdf command.

38 Tests for Homogeneity: Conditions and P-values
Problem: Earlier, we calculated 2 = 4.25. (b) Use Table C to find the P-value. Then use your calculator’s 2 cdf command. (b) df = (3 – 1)(2 – 1) = 2 Using Table C: The P-value is between 0.10 and 0.15. Using technology: 2 cdf(lower:4.25, upper:10000, df:2) = 0.119

39 Tests for Homogeneity: Conditions and P-values
Problem: Earlier, we calculated 2 = 4.25. (c) Interpret the P-value from the calculator.

40 Tests for Homogeneity: Conditions and P-values
Problem: Earlier, we calculated 2 = 4.25. (c) Interpret the P-value from the calculator. (c) Assuming that the gender of the interviewer doesn’t affect responses to this question, there is a probability of observing differences in the distributions of responses as large as or larger than those in this study by chance alone.

41 Tests for Homogeneity: Conditions and P-values
Problem: Earlier, we calculated 2 = 4.25. (d) What conclusion would you draw?

42 Tests for Homogeneity: Conditions and P-values
Problem: Earlier, we calculated 2 = 4.25. (d) What conclusion would you draw? (d) Because the P-value of > α = 0.05, we fail to reject H0. There is not convincing evidence of a difference in the true distributions of response to this question when asked by a male interviewer and when asked by a female interviewer for subjects like these.

43 Tests for Homogeneity: Conditions and P-values
As usual, technology saves time and gets the arithmetic right.

44 Tests for Homogeneity: Conditions and P-values
As usual, technology saves time and gets the arithmetic right.

45 Tests for Homogeneity: Conditions and P-values
As usual, technology saves time and gets the arithmetic right.

46 Tests for Homogeneity: Conditions and P-values
As usual, technology saves time and gets the arithmetic right.

47 Tests for Homogeneity: Conditions and P-values
As usual, technology saves time and gets the arithmetic right.

48 Tests for Homogeneity: Conditions and P-values
AP® Exam Tip You can use your calculator to carry out the mechanics of a significance test on the AP® Statistics exam—but there’s a risk involved. If you just give the calculator answer without showing work, and one or more of your entries is incorrect, you will likely get no credit for the “Do” step. We recommend writing out the first few terms of the chi-square calculation followed by “. . .”. This approach may help you earn partial credit if you enter a number incorrectly. Be sure to name the procedure (2 test for homogeneity) and to report the test statistic (2 = ), degrees of freedom (df = 4), and P-value (0.0011).

49 Putting It All Together: The Chi-Square Test for Homogeneity
Suppose the conditions are met. To perform a test of H0: There is no difference in the distribution of a categorical variable for several populations or treatments compute the chi-square test statistic 𝜒 2 = (Observed count −Expected count) 2 Expected count where the sum is over all cells (not including totals) in the two-way table. The P-value is the area to the right of 2 under the chi-square density curve with degrees of freedom = (num. of rows – 1)(num. of columns – 1).

50 Putting It All Together: The Chi-Square Test for Homogeneity
Problem: The Pew Research Center conducts surveys about a variety of topics in many different countries. In one survey, it wanted to investigate how residents of different countries feel about the importance of speaking the national language. Separate random samples of residents of Australia, the United Kingdom, and the United States were asked many questions, including the following: “Some people say that the following things are important for being truly [survey country nationality]. Others say they are not important. How important do you think it is to be able to speak English?” The two-way table summarizes the responses to this question. Do these data provide convincing evidence at the α = 0.05 level that the distributions of opinion about speaking English differ for residents of Australia, the U.K., and the U.S.? Scott Olson/Getty Images

51 Putting It All Together: The Chi-Square Test for Homogeneity
Problem: Do these data provide convincing evidence at the α = 0.05 level that the distributions of opinion about speaking English differ for residents of Australia, the U.K., and the U.S.? Scott Olson/Getty Images

52 Putting It All Together: The Chi-Square Test for Homogeneity
Problem: Do these data provide convincing evidence at the α = 0.05 level that the distributions of opinion about speaking English differ for residents of Australia, the U.K., and the U.S.? Scott Olson/Getty Images STATE H0: There is no difference in the true distributions of opinion about speaking English for residents of Australia, the U.K., and the U.S. Ha: There is a difference in the true distributions of opinion about speaking English for residents of Australia, the U.K., and the U.S. We’ll use α = 0.05.

53 Putting It All Together: The Chi-Square Test for Homogeneity
Problem: Do these data provide convincing evidence at the α = 0.05 level that the distributions of opinion about speaking English differ for residents of Australia, the U.K., and the U.S.? Scott Olson/Getty Images PLAN • Random: Independent random samples of residents from the three countries. ✓ ◦ 10%: 1000 is < 10% of all Australian residents, 1460 is < 10% of all U.K. residents, and 1003 is < 10% of all U.S. residents. ✓ • Large Counts: All expected counts are ≥ 5 (see table below). ✓

54 Putting It All Together: The Chi-Square Test for Homogeneity
Problem: Do these data provide convincing evidence at the α = 0.05 level that the distributions of opinion about speaking English differ for residents of Australia, the U.K., and the U.S.? Scott Olson/Getty Images PLAN

55 Putting It All Together: The Chi-Square Test for Homogeneity
Problem: Do these data provide convincing evidence at the α = 0.05 level that the distributions of opinion about speaking English differ for residents of Australia, the U.K., and the U.S.? Scott Olson/Getty Images DO 𝜒 2 = (690−741.8) (1177−1083.1) ⋯=68.57 df = (4 – 1)(3 – 1) = 6 Using Table C: P-value < Using technology: 2 cdf(lower:68.57, upper:10000, df:6) ≈ 0

56 Putting It All Together: The Chi-Square Test for Homogeneity
Problem: Do these data provide convincing evidence at the α = 0.05 level that the distributions of opinion about speaking English differ for residents of Australia, the U.K., and the U.S.? Scott Olson/Getty Images CONCLUDE Because the P-value of approximately 0 < α = 0.05, we reject H0. There is convincing evidence that there is a difference in the true distributions of opinion about speaking English for residents of Australia, the U.K., and the U.S.

57 Putting It All Together: The Chi-Square Test for Homogeneity
AP® Exam Tip Many students lose credit on the AP® Statistics exam because they don’t write down and label the expected counts in their response. It isn’t enough to claim that all the expected counts are at least 5. You must provide clear evidence.

58 Relationships Between Two Categorical Variables
Are people who are prone to sudden anger more likely to develop heart disease? An observational study followed a random sample of 8474 people with normal blood pressure for about four years.

59 Relationships Between Two Categorical Variables
Are people who are prone to sudden anger more likely to develop heart disease? An observational study followed a random sample of 8474 people with normal blood pressure for about four years. Each person took the Spielberger Trait Anger Scale test. Researchers also recorded whether each individual developed coronary heart disease (CHD).

60 Relationships Between Two Categorical Variables
Are people who are prone to sudden anger more likely to develop heart disease? An observational study followed a random sample of 8474 people with normal blood pressure for about four years. Each person took the Spielberger Trait Anger Scale test. Researchers also recorded whether each individual developed coronary heart disease (CHD).

61 Relationships Between Two Categorical Variables
Are people who are prone to sudden anger more likely to develop heart disease? An observational study followed a random sample of 8474 people with normal blood pressure for about four years. Each person took the Spielberger Trait Anger Scale test. Researchers also recorded whether each individual developed coronary heart disease (CHD). Do these data provide convincing evidence of an association between the variables in the larger population?

62 Tests for Independence: Stating Hypotheses
H0: There is no association between anger level and heart-disease status in the population of people with normal blood pressure. Ha: There is an association between anger level and heart-disease status in the population of people with normal blood pressure.

63 Tests for Independence: Stating Hypotheses
H0: There is no association between anger level and heart-disease status in the population of people with normal blood pressure. Ha: There is an association between anger level and heart-disease status in the population of people with normal blood pressure. An equivalent way to state the hypotheses is

64 Tests for Independence: Stating Hypotheses
H0: There is no association between anger level and heart-disease status in the population of people with normal blood pressure. Ha: There is an association between anger level and heart-disease status in the population of people with normal blood pressure. An equivalent way to state the hypotheses is H0: Anger and heart-disease status are independent in the population of people with normal blood pressure. Ha: Anger and heart-disease status are not independent in the population of people with normal blood pressure.

65 Tests for Independence: Expected Counts

66 Tests for Independence: Expected Counts
If we imagine choosing one of these people at random, P(Yes) = 190/8474.

67 Tests for Independence: Expected Counts
If we imagine choosing one of these people at random, P(Yes) = 190/8474. If we assume that H0 is true, then anger level and CHD status are independent.

68 Tests for Independence: Expected Counts
If we imagine choosing one of these people at random, P(Yes) = 190/8474. If we assume that H0 is true, then anger level and CHD status are independent. So, P(Yes | Low anger) = P(Yes) = 190/8474 =

69 Tests for Independence: Expected Counts
If we imagine choosing one of these people at random, P(Yes) = 190/8474. Of the 3110 low-anger people in the study, we’d expect 3110·(190/8474) = 3110( ) = 69.73 to get CHD. If we assume that H0 is true, then anger level and CHD status are independent. So, P(Yes | Low anger) = P(Yes) = 190/8474 =

70 Tests for Independence: Expected Counts
If we imagine choosing one of these people at random, P(Yes) = 190/8474. Of the 3110 low-anger people in the study, we’d expect 3110·(190/8474) = 3110( ) = 69.73 to get CHD. If we assume that H0 is true, then anger level and CHD status are independent. So, P(Yes | Low anger) = P(Yes) = 190/8474 =

71 Tests for Independence: Expected Counts

72 Tests for Independence: Expected Counts
expected count= row total∙column total table total

73 Tests for Independence: Expected Counts
expected count= row total∙column total table total

74 Tests for Independence: Conditions and Calculations
Conditions for Performing a Chi-Square Test for Independence • Random: The data come from a random sample from the population of interest. ◦ 10%: When sampling without replacement, n < 0.10N. • Large Counts: All expected counts are at least 5.

75 Putting It All Together: The Chi-Square Test for Independence
Suppose the conditions are met. To perform a test of H0: There is no association between two categorical variables in the population of interest compute the chi-square test statistic: 𝜒 2 = (Observed count −Expected count) 2 Expected count where the sum is over all cells (not including totals) in the two-way table. The P-value is the area to the right of 2 under the chi-square density curve with degrees of freedom = (num. of rows – 1)(num. of columns – 1).

76 Putting It All Together: The Chi-Square Test for Independence
Problem: In Chapter 1, you read about a random sample of winter visitors to Yellowstone National Park. Each of the visitors in the sample was asked two questions: 1. Do you belong to an environmental club (like the Sierra Club)? 2. What is your experience with a snowmobile: own, rent, or never used? The two-way table summarizes the results. Do these data provide convincing evidence of an association between environmental club status and type of snowmobile use in the population of winter visitors to Yellowstone National Park? korolOK/Depositphotos

77 Putting It All Together: The Chi-Square Test for Independence
Problem: Do these data provide convincing evidence of an association between environmental club status and type of snowmobile use in the population of winter visitors to Y.N.P.? korolOK/Depositphotos

78 Putting It All Together: The Chi-Square Test for Independence
Problem: Do these data provide convincing evidence of an association between environmental club status and type of snowmobile use in the population of winter visitors to Y.N.P.? korolOK/Depositphotos STATE H0: There is no association between environmental club status and type of snowmobile use in the population of winter visitors to YNP. Ha: There is an association between environmental club status and type of snowmobile use in the population of winter visitors to YNP. We’ll use α = 0.05.

79 Putting It All Together: The Chi-Square Test for Independence
Problem: Do these data provide convincing evidence of an association between environmental club status and type of snowmobile use in the population of winter visitors to Y.N.P.? korolOK/Depositphotos PLAN Chi-square test for independence • Random: Random sample of 1526 winter visitors to Yellowstone. ✓ º 10%: It is reasonable to assume that 1526 * 10% of all winter visitors to Yellowstone. ✓ • Large Counts: All expected counts are at least 5 (see table below).✓

80 Putting It All Together: The Chi-Square Test for Independence
Problem: Do these data provide convincing evidence of an association between environmental club status and type of snowmobile use in the population of winter visitors to Y.N.P.? korolOK/Depositphotos PLAN

81 Putting It All Together: The Chi-Square Test for Independence
Problem: Do these data provide convincing evidence of an association between environmental club status and type of snowmobile use in the population of winter visitors to Y.N.P.? korolOK/Depositphotos DO df = (3 – 1)(2 – 1) = 2 Using Table C: P-value < Using technology: calculator 2 –Test gives P-value = 4.82 × 10-26 𝜒 2 = (445−525.7) (212−131.3) ⋯=116.6

82 Putting It All Together: The Chi-Square Test for Independence
Problem: Do these data provide convincing evidence of an association between environmental club status and type of snowmobile use in the population of winter visitors to Y.N.P.? korolOK/Depositphotos CONCLUDE Because the P-value of approximately 0 < α = 0.05, we reject H0. We have convincing evidence of an association between environmental club status and type of snowmobile use in the population of winter visitors to Yellowstone National Park.

83 Putting It All Together: The Chi-Square Test for Independence
AP® Exam Tip When the P-value is very small, the calculator will report it using scientific notation. Remember that P-values are probabilities and must be between 0 and 1. If your calculator reports the P-value with a number that appears to be greater than 1, look to the right, and you will see that the P-value is being expressed in scientific notation. If you claim that the P-value is 4.82, you will certainly lose credit.

84 Using Chi-Square Tests Wisely
The chi-square test for homogeneity and the chi-square test for independence are very similar.

85 Using Chi-Square Tests Wisely
The chi-square test for homogeneity and the chi-square test for independence are very similar. The best way to distinguish these two tests is to consider how the data were produced.

86 Using Chi-Square Tests Wisely
The chi-square test for homogeneity and the chi-square test for independence are very similar. The best way to distinguish these two tests is to consider how the data were produced. If the data come from two or more independent random samples or treatment groups in a randomized experiment, then do a chi-square test for homogeneity.

87 Using Chi-Square Tests Wisely
The chi-square test for homogeneity and the chi-square test for independence are very similar. The best way to distinguish these two tests is to consider how the data were produced. If the data come from two or more independent random samples or treatment groups in a randomized experiment, then do a chi-square test for homogeneity. If the data come from a single random sample, with the individuals classified according to two categorical variables, use a chi-square test for independence.

88 Using Chi-Square Tests Wisely
Problem: Are men and women equally likely to suffer lingering fear from watching scary movies as children? Researchers asked a random sample of 117 college students to write narrative accounts of their exposure to scary movies before the age of 13. More than one-fourth of the students said that some of the fright symptoms are still present when they are awake. The following table breaks down these results by gender. Assume that the conditions for performing inference are met. Minitab output for a chi-square test using these data is shown. (continued) Alamy

89 Using Chi-Square Tests Wisely
Problem: (a) Should a chi-square test for independence or a chi-square test for homogeneity be used in this setting? Explain your reasoning. Alamy

90 Using Chi-Square Tests Wisely
Problem: (a) Should a chi-square test for independence or a chi-square test for homogeneity be used in this setting? Explain your reasoning. Alamy (a) Chi-square test for independence. The data were produced using a single random sample of college students, who were then classified according to two variables: gender and whether or not they had lingering fright symptoms. The chi-square test for homogeneity requires independent random samples from each population.

91 Using Chi-Square Tests Wisely
Problem: (b) State an appropriate pair of hypotheses for researchers to test in this setting. Alamy

92 Using Chi-Square Tests Wisely
Problem: (b) State an appropriate pair of hypotheses for researchers to test in this setting. Alamy (b) H0: There is no association between gender and whether or not college students have lingering fright symptoms. Ha: There is an association between gender and whether or not college students have lingering fright symptoms.

93 Using Chi-Square Tests Wisely
Problem: (c) Which cell contributes most to the chi-square test statistic? In what way does this cell differ from what the null hypothesis suggests? Alamy

94 Using Chi-Square Tests Wisely
Problem: (c) Which cell contributes most to the chi-square test statistic? In what way does this cell differ from what the null hypothesis suggests? Alamy (c) Men who admit to having lingering fright symptoms account for the largest component of the chi-square test statistic (1.883). Far fewer men in the sample admitted to lingering fright symptoms (7) than we would expect if H0 were true (11.69).

95 Using Chi-Square Tests Wisely
Problem: (d) Interpret the P-value. What conclusion would you draw at α = 0.01? Alamy

96 Using Chi-Square Tests Wisely
Problem: (d) Interpret the P-value. What conclusion would you draw at α = 0.01? Alamy (d) If there is no association between gender and whether or not college students have lingering fright symptoms, there is a probability of obtaining an association as strong as or stronger than the one observed in the random sample of 117 students.

97 Using Chi-Square Tests Wisely
Problem: (d) Interpret the P-value. What conclusion would you draw at α = 0.01? Alamy (d) Because the P-value of > α = 0.01, we fail to reject H0. We do not have convincing evidence that there is an association between gender and whether or not college students have lingering fright symptoms.

98 Section Summary STATE appropriate hypotheses and COMPUTE the expected counts and chi-square test statistic for a chi-square test based on data in a two- way table. STATE and CHECK the Random, 10%, and Large Counts conditions for a chi-square test based on data in a two-way table. Calculate the degrees of freedom and P-value for a chi-square test based on data in a two-way table. PERFORM a chi-square test for homogeneity. PERFORM a chi-square test for independence. CHOOSE the appropriate chi-square test in a given setting.

99 Assignment 11.2 p #30, 34, 38, 40, 42, 48, all If you are stuck on any of these, look at the odd before or after and the answer in the back of your book. If you are still not sure text a friend or me for help (before 8pm). Tomorrow we will check homework and review for 11.2 Quiz.


Download ppt "Chapter 11 Inference for Distributions of Categorical Data"

Similar presentations


Ads by Google