Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.

Slides:



Advertisements
Similar presentations
Chapter 26 Comparing Counts
Advertisements

Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Chapter 13: Inference for Distributions of Categorical Data
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Homogeneity.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Chapter 26: Comparing Counts
CHAPTER 11 Inference for Distributions of Categorical Data
11-3 Contingency Tables In this section we consider contingency tables (or two-way frequency tables), which include frequency counts for categorical data.
Presentation 12 Chi-Square test.
GOODNESS OF FIT TEST & CONTINGENCY TABLE
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 15 Inference for Counts:
 Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Chapter 26: Comparing Counts AP Statistics. Comparing Counts In this chapter, we will be performing hypothesis tests on categorical data In previous chapters,
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.
AP Statistics Chapter 26 Notes
CHAPTER 26: COMPARING COUNTS OF CATEGORICAL DATA To test claims and make inferences about counts for categorical variables Objective:
Chi-square test or c2 test
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
Chapter 26: Comparing counts of categorical data
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Other Chi-Square Tests
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 26 Comparing Counts.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
Chapter 11 The Chi-Square Test of Association/Independence Target Goal: I can perform a chi-square test for association/independence to determine whether.
Copyright © 2010 Pearson Education, Inc. Slide
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
© Copyright McGraw-Hill CHAPTER 11 Other Chi-Square Tests.
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Chapter Outline Goodness of Fit test Test of Independence.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
ContentFurther guidance  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample,
Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chi-Squared Test of Homogeneity Are different populations the same across some characteristic?
Statistics 26 Comparing Counts. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Textbook Section * We already know how to compare two proportions for two populations/groups. * What if we want to compare the distributions of.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Chapter 26 Comparing Counts. Objectives Chi-Square Model Chi-Square Statistic Knowing when and how to use the Chi- Square Tests; Goodness of Fit Test.
Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a goodness-of-fit.
Copyright © 2009 Pearson Education, Inc. Chapter 26 Comparing Counts.
Comparing Counts Chi Square Tests Independence.
CHAPTER 26 Comparing Counts.
Chi-square test or c2 test
Chapter 25 Comparing Counts.
Chi Square Two-way Tables
Contingency Tables: Independence and Homogeneity
15.1 Goodness-of-Fit Tests
Inference on Categorical Data
Paired Samples and Blocks
Analyzing the Association Between Categorical Variables
Chapter 26 Comparing Counts.
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Presentation transcript:

Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square test of homogeneity. A test of homogeneity is actually the generalization of the two-proportion z-test.

Comparing Observed Distributions (cont.) The statistic that we calculate for this test is identical to the chi-square statistic for goodness- of-fit. In this test, however, we ask whether choices have changed (i.e., there is no model). The expected counts are found directly from the data and we have different degrees of freedom.

Assumptions and Conditions The assumptions and conditions are the same as for the chi-square goodness-of-fit test: –Counted Data Condition: The data must be counts. –Randomization Condition: As long as we don’t want to generalize, we don’t have to check this condition. –Expected Cell Frequency Condition: The expected count in each cell must be at least 5.

Calculations To find the expected counts, we multiply the row total by the column total and divide by the grand total. We calculated the chi-square statistic as we did in the goodness-of-fit test: In this situation we have (R – 1)(C – 1) degrees of freedom, where R is the number of rows and C is the number of columns. –We’ll need the degrees of freedom to find a P-value for the chi- square statistic.

Examining the Residuals When we reject the null hypothesis, it’s always a good idea to examine residuals. For chi-square tests, we want to work with standardized residuals, since we want to compare residuals for cells that may have very different counts. To standardize a cell’s residual, we just divide by the square root of its expected value:

Examining the Residuals (cont.) These standardized residuals are just the square roots of the components we calculated for each cell, with the + or the – sign indicating whether we observed more cases than we expected, or fewer. The standardized residuals give us a chance to think about the underlying patterns and to consider the ways in which the distribution might not match what we hypothesized to be true.

Example A researcher wishes to test whether the proportion of college students who smoke is the same in four different colleges. She randomly selects 100 students from each college and records the number that smoke. The results are shown below. Are the proportions of students smoking the same for all four colleges? OBSERVEDCollege ACollege BCollege CCollege D Smoke Don’t Smoke EXPECTEDCollege ACollege BCollege CCollege D Smoke22 Don’t Smoke78

1. Hypothesis Chi-Square Test for Homogeneity H o : The proportion of students smoking is the same for all four colleges H a : The proportion of students smoking is different for all four colleges Example A researcher wishes to test whether the proportion of college students who smoke is the same in four different colleges. She randomly selects 100 students from each college and records the number that smoke. The results are shown below. Are the proportions of students smoking the same for all four colleges? Observed (Expected)College ACollege BCollege CCollege D Smoke17 (22)26 (22)11 (22)34 (22) Don’t Smoke83 (78)74 (78)89 (78)66 (78)

2. Check Assumptions/Conditions SRS is assumed We are working with counts All expected counts  5 Example A researcher wishes to test whether the proportion of college students who smoke is the same in four different colleges. She randomly selects 100 students from each college and records the number that smoke. The results are shown below. Are the proportions of students smoking the same for all four colleges? Observed (Expected)College ACollege BCollege CCollege D Smoke17 (22)26 (22)11 (22)34 (22) Don’t Smoke83 (78)74 (78)89 (78)66 (78)

3. Calculate Test Example A researcher wishes to test whether the proportion of college students who smoke is the same in four different colleges. She randomly selects 100 students from each college and records the number that smoke. The results are shown below. Are the proportions of students smoking the same for all four colleges? Observed (Expected)College ACollege BCollege CCollege D Smoke17 (22)26 (22)11 (22)34 (22) Don’t Smoke83 (78)74 (78)89 (78)66 (78)

4. Conclusion  = 0.05 Since the p-value is less than alpha, we reject the claim that the proportion of students smoking is the same for all four colleges Example A researcher wishes to test whether the proportion of college students who smoke is the same in four different colleges. She randomly selects 100 students from each college and records the number that smoke. The results are shown below. Are the proportions of students smoking the same for all four colleges? Observed (Expected)College ACollege BCollege CCollege D Smoke17 (22)26 (22)11 (22)34 (22) Don’t Smoke83 (78)74 (78)89 (78)66 (78)

Independence Contingency tables categorize counts on two (or more) variables so that we can see whether the distribution of counts on one variable is contingent on the other. A test of whether the two categorical variables are independent examines the distribution of counts for one group of individuals classified according to both variables in a contingency table. A chi-square test of independence uses the same calculation as a test of homogeneity.

Assumptions and Conditions We still need counts and enough data so that the expected values are at least 5 in each cell. If we’re interested in the independence of variables, we usually want to generalize from the data to some population. –In that case, we’ll need to check that the data are a representative random sample from that population.

Example The table below shows the age and favorite type of music of 668 randomly selected people. Are age and preferred music type associated? OBSERVEDRockPopClassical EXPECTEDRockPopClassical

1. Hypothesis Chi-Square Test for Independence H o : Age and preferred music type are independent H a : Age and preferred music type are dependent Example The table below shows the age and favorite type of music of 668 randomly selected people. Are age and preferred music type associated? Observed (Expected)RockPopClassical (64.77)85 (77.84)73 (65.39) (68.19)91 (81.96)60 (68.85) (75.04)74 (90.19)77 (75.76)

2. Check Assumptions/Conditions SRS is assumed We are working with counts All expected counts  5 Example The table below shows the age and favorite type of music of 668 randomly selected people. Are age and preferred music type associated? Observed (Expected)RockPopClassical (64.77)85 (77.84)73 (65.39) (68.19)91 (81.96)60 (68.85) (75.04)74 (90.19)77 (75.76)

3. Calculate Test Example The table below shows the age and favorite type of music of 668 randomly selected people. Are age and preferred music type associated? Observed (Expected)RockPopClassical (64.77)85 (77.84)73 (65.39) (68.19)91 (81.96)60 (68.85) (75.04)74 (90.19)77 (75.76)

4. Conclusion  = 0.05 Since the p-value is less than alpha, we reject the claim that age and preferred music type are independent of each other. Example The table below shows the age and favorite type of music of 668 randomly selected people. Are age and preferred music type associated? Observed (Expected)RockPopClassical (64.77)85 (77.84)73 (65.39) (68.19)91 (81.96)60 (68.85) (75.04)74 (90.19)77 (75.76)

Chi-Square and Causation Chi-square tests are common, and tests for independence are especially widespread. We need to remember that a small P- value is not proof of causation. –Since the chi-square test for independence treats the two variables symmetrically, we cannot differentiate the direction of any possible causation even if it existed. –And, there’s never any way to eliminate the possibility that a lurking variable is responsible for the lack of independence.

Chi-Square and Causation (cont.) In some ways, a failure of independence between two categorical variables is less impressive than a strong, consistent, linear association between quantitative variables. Two categorical variables can fail the test of independence in many ways. –Examining the standardized residuals can help you think about the underlying patterns.

What have we learned? We’ve learned how to test hypotheses about categorical variables. All three methods we examined look at counts of data in categories and rely on chi-square models. –Goodness-of-fit tests compare the observed distribution of a single categorical variable to an expected distribution based on theory or model. –Tests of homogeneity compare the distribution of several groups for the same categorical variable. –Tests of independence examine counts from a single group for evidence of an association between two categorical variables.

What have we learned? (cont.) Mechanically, these tests are almost identical. While the tests appear to be one-sided, conceptually they are many-sided, because there are many ways that the data can deviate significantly from what we hypothesize. When we reject the null hypothesis, we know to examine standardized residuals to better understand the patterns in the data.