Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chi-Squared Test of Homogeneity Are different populations the same across some characteristic?

Similar presentations


Presentation on theme: "Chi-Squared Test of Homogeneity Are different populations the same across some characteristic?"— Presentation transcript:

1 Chi-Squared Test of Homogeneity Are different populations the same across some characteristic?

2  2 test for homogeneity single categorical two (or more) independent samplesUsed with a single categorical variable from two (or more) independent samples Used to see if the two populations are the same (homogeneous) Several groups but STILL ONE VARIABLE

3 Assumptions & formula remain the same! Samples are from a random sampling All expected counts are greater than 5

4 Hypotheses – written in words H 0 : the proportions for the two (or more) distributions are the same H a : At least one of the proportions for the distributions is different Be sure to write in context!

5 Expected Counts Assuming H 0 is true,

6 Degrees of freedom Or cover up one row & one column & count the number of cells remaining!

7 Should Dentist Advertise? It may seem hard to believe but until the 1970’s most professional organizations prohibited their members from advertising. In 1977, the U.S. Supreme Court ruled that prohibiting doctors and lawyers from advertising violated their free speech rights. Why do you think professional organizations sought to prohibit their members from advertising?

8 Should Dentist Advertise? The paper “Should Dentist Advertise?” (J. of Advertising Research (June 1982): 33 – 38) compared the attitudes of consumers and dentists toward the advertising of dental services. Separate samples of 101 consumers and 124 dentists were asked to respond to the following statement: “I favor the use of advertising by dentists to attract new patients.”

9 Should Dentist Advertise? Possible responses were: strongly agree, neutral, disagree, strongly disagree. The authors were interested in determining whether the two groups— dentists and consumers—differed in their attitudes toward advertising.

10 Should Dentist Advertise? This is a done by a chi-squared test of homogeneity, that is we are testing the claim that different populations have the same ratio across some second variable characteristic. So how should we state the null and alternative hypotheses for this test?

11 Should Dentist Advertise? H 0 : H a : The true category proportions for all responses are the same for both populations of consumers and dentists. The true category proportions for all responses are not the same for both populations of consumers and dentists.

12 Observed Data How do we determine the expected cell count under the assumption of homogeneity? The expected cell counts are estimated from the sample data (assuming that H 0 is true) by using … 101 124

13 Expected Values So the calculation for the first cell is … 19.30 101 124

14 Observed Data Students on the right side of the classroom finish the first row and the left side find the expected values for the dentists. 19.30 23.70 30.08 36.9217.64 14.36 28.11 14.36 22.89 17.64

15 Conditions So now we can consider the conditions of our analysis. –We will assume the data was randomly selected. –The sample was large enough because every cell in the contingency table had an expected frequency of at least 5.

16 Test Statistic Now we can calculate the  2 test statistic:

17 Sampling Distribution The two-way table for this situation has 2 rows and 5 columns, so the appropriate degrees of freedom is (2 – 1)(5 – 1) = 4. Since the likelihood of seeing such a large amount of difference between the observed frequencies and what we would expected to have seen if the two populations were homogeneous is so small (approx 0), there is strong evidence against the assumption that the proportions in the response categories are the same for the populations of consumers and dentists.

18 Post-graduation activities of graduates from an upstate NY high school 198019902000 Total College/post HS education 320245288853 Employment982417139 Military1819542 Travel172524 Total 4532903151058 Have what kids do after graduation changed across three graduating classes?

19 Could test whether two proportions are the same using a two-proportion z test…. but we have 3 groups. Chi-square goodness-of-fit tests against given proportions (theoretical models) …. but we want to know if choices have changed. So… we’ll use a chi-square test of homogeneity. Homogeneity means that things are the same so we have a built-in null hypothesis – the distribution does not change from group to group. This test looks for differences too large from what we might expect from random sample-to-sample variation.

20 198019902000 Total College/post HS education 320 (365.2) 245 (233.8) 288 (253.9) 853 Employment98 (59.5) 24 (38.1) 17 (41.4) 139 Military18 (17.98) 19 (11.5) 5 (12.5) 42 Travel17 (10.3) 2 (6.6) 5 (7.1) 24 Total 4532903151058

21 H o : The post-high school choices made by classes of 1980, 1990, 2000 have the same distributions H a : The post-high school choices made by classes of 1980, 1990, 2000 do not have the same distributions Conditions: * categorical data with counts * expected values are all at least 5 Degrees of freedom: (R – 1)(C – 1) = 3 * 2 = 6 Test statistic:

22 = 72.77 P-value = P(x 2 > 72.77) < 0.0001 The P-value is very small, so I reject the null hypothesis and conclude there is evidence that the choices made by high-school graduates have changed over the three classes examined.

23 When we reject the null hypothesis, it’s a good idea to examine residuals. To standardize the residuals: 198019902000 College/post HS education -2.366.7322.136 Employment4.989-2.284-3.791 Military.0042.207-2.122 Travel2.098-1.785-.803 What can this show us?

24 The following data is on drinking behavior for independently chosen random samples of male and female students. Does there appear to be a gender difference with respect to drinking behavior? (Note: low = 1-7 drinks/wk, moderate = 8-24 drinks/wk, high = 25 or more drinks/wk)

25 Assumptions: Have 2 random sample of students All expected counts are greater than 5. H 0 : the proportions of drinking behaviors is the same for female & male students H a : at least one of the proportions of drinking behavior is different for female & male students P-value =.000df = 3  =.05 Since p-value < , I reject H 0. There is sufficient evidence to suggest that drinking behavior is not the same for female & male students. Expected Counts: M F 0158.6167.4 L554.0585.0 M230.1243.0 H38.440.6

26  2 test for Independence Used with categorical, bivariate data from ONE sample Used to see if the two categorical variables are associated (dependent) or not associated (independent) One sample but two variables

27 Hypotheses – written in words H 0 : two variables are independent H a : two variables are dependent Be sure to write in context!

28 Assumptions & formula remain the same! Expected counts & df are found the same way as test for homogeneity. Only Only change is the hypotheses!

29 A study from the University of Texas Southwestern Medical Center examined whether the risk of hepatitis C was related to whether people had tattoos and to where they got their tattoos. Hepatitis CNo Hepatitis CTotal Tattoo, parlor173552 Tattoo, elsewhere 85361 None22491513 Total47579626 Data differs from other kinds because they categorize subjects from a single group on two categorical variable rather than on only one.

30 Is the chance of having hepatitis C independent of tattoo status? If hepatitis status is independent of tattoos, we expect the proportion of people testing positive for hepatitis to be the same for the three levels of tattoo status. Are the categorical variables tattoo status and hepatitis statistically independent? A chi-square test for independence

31 H o : Tattoo status and hepatitis status are independent H a : Tattoo status and hepatitis status are not independent Conditions: * categorical data with counts * expected values are all at least 5 Degrees of freedom: (R – 1)(C – 1) = 2 * 1 = 2 Test statistic:

32 Hepatitis CNo Hepatitis CTotal Tattoo, parlor17 (3.904) 35 (48.096) 52 Tattoo, elsewhere 8 (4.580) 53 (56.420) 61 None22 (38.516) 491 (474.484) 513 Total47579626 = 57.91 P-value = P(x 2 > 57.91) < 0.0001

33 The p-value is very small, so I reject the null hypothesis and conclude that hepatitis status is not independent of tattoo status. Because the expected cell frequency condition was violated, I need to check that the two cells with small expected counts did not influence this result too greatly.

34 Whenever we reject the null hypothesis, it’s a good idea to examine the residuals. Since counts may be different for cells, we are better off standardizing the residuals. To standardize a cell’s residuals, divide by the square root of its expected value. The + and the – sign indicate whether we observed more cases than we expected, or fewer.

35 Hepatitis CNo Hepatitis C Tattoo, parlor6.628-1.888 Tattoo, elsewhere 1.598-.455 None-2.661.758 Examining the residuals: largest component: Hepatitis C/Tattoo parlor – suggest that a principal source of infection may be tattoo parlors second largest component: Hepatitis C/no tattoo – those who have no tattoos are less likely to be infected with hepatitis C than we might expect if the two variables are independent

36 A beef distributor wishes to determine whether there is a relationship between geographic region and cut of meat preferred. If there is no relationship, we will say that beef preference is independent of geographic region. Suppose that, in a random sample of 500 customers, 300 are from the North and 200 from the South. Also, 150 prefer cut A, 275 prefer cut B, and 75 prefer cut C.

37 If beef preference is independent of geographic region, how would we expect this table to be filled in? NorthSouthTotal Cut A150 Cut B275 Cut C75 Total300200500 9060 165110 4530

38 Now suppose that in the actual sample of 500 consumers the observed numbers were as follows: (on your paper) Is there sufficient evidence to suggest that geographic regions and beef preference are not independent? (Is there a difference between the expected and observed counts?)

39 Assumptions: Have a random sample of people All expected counts are greater than 5. H 0 : geographic region and beef preference are independent H a : geographic region and beef preference are dependent P-value =.0226df = 2  =.05 Since p-value < , I reject H 0. There is sufficient evidence to suggest that geographic region and beef preference are dependent. Expected Counts: N S A90 60 B165110 C45 30


Download ppt "Chi-Squared Test of Homogeneity Are different populations the same across some characteristic?"

Similar presentations


Ads by Google