Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chi-square Test of Independence. The chi-square test of independence is probably the most frequently used hypothesis test in the social sciences. In this.

Similar presentations


Presentation on theme: "Chi-square Test of Independence. The chi-square test of independence is probably the most frequently used hypothesis test in the social sciences. In this."— Presentation transcript:

1 Chi-square Test of Independence

2 The chi-square test of independence is probably the most frequently used hypothesis test in the social sciences. In this exercise, we will use the chi-square test of independence to evaluate group differences when the test variable is nominal, dichotomous, ordinal, or grouped interval. The chi-square test of independence can be used for any variable; the group (independent) and the test variable (dependent) can be nominal, dichotomous, ordinal, or grouped interval.

3 Independence Defined Two variables are independent if, for all cases, the classification of a case into a particular category of one variable (the group variable such as gender) has no effect on the probability that the case will fall into any particular category of the second variable (the test variable, such as major choice). When two variables are independent, there is no relationship between them. We would expect that the frequency breakdowns of the test variable to be similar for all groups.

4 Suppose we are interested in the relationship between gender and attending college. If there is no relationship between gender and attending college and 40% of our total sample attend college, we would expect 40% of the males in our sample to attend college and 40% of the females to attend college. If there is a relationship between gender and attending college, we would expect a higher proportion of one group to attend college than the other group, e.g. 60% to 20%.

5 Displaying Independent and Dependent Relationships When the variables are independent, the proportion in both groups is close to the same size as the proportion for the total sample. When group membership makes a difference, the dependent relationship is indicated by one group having a higher proportion than the proportion for the total sample.

6 Independence Demonstrated Suppose we are interested in the relationship between gender and preference on major of study. A group of 300 were randomly selected and asked whether he or she prefers taking liberal arts courses in the area of math–science, social science, or humanities.

7 Looking at the bar chart, it appears that there are differences in the proportion of females’ and males’ preference for each subject. Does this sample present sufficient evidence to reject the null hypothesis “Preference for math–science, social science, or humanities is independent of the gender of a college student”? (In layman term, it implies that the preference for subject area is not related to gender). We need a statistical test to verify our visual impression, that is there is a statistical dependence between them.

8 Expected Frequencies Expected frequencies are computed as if there is no difference between the groups, i.e. both groups have the same proportion as the total sample in each category of the test variable. Two random variables X and Y are called independent if the probability distribution of one variable is not affected by the presence of another. As defined in conditional probability, independence requires P(MS | M) = P(MS | F) = P(MS) ; that is, gender has no effect on the probability of a person’s choice of subject area. If both events are independent, then P(A  B)= P(A) * P(B)

9 Since the proportion of subjects in each category of the group variable can differ, we take group category into account in computing expected frequencies as well. To summarize, the expected frequencies for each cell are computed to be proportional to both the breakdown for the test variable and the breakdown for the group variable.

10 Expected Frequency Calculation The data from “Observed Frequencies for Sample Data” is the source for information to compute the expected frequencies. Percentages are computed for the column of all students and for the row of all GPA’s. These percentages are then multiplied by the total number of students in the sample (453) to compute the expected frequency for each cell in the table.

11 Expected Frequencies versus Observed Frequencies The chi-square test of independence plugs the observed frequencies and expected frequencies into a formula which computes how the pattern of observed frequencies differs from the pattern of expected frequencies. The chi-square test of independence is a test of the influence or impact that a subject’s value on one variable has on the same subject’s value for a second variable. To test the relationship, we use the chi-square test statistic, which follows the chi-square distribution.

12 Steps in hypothesis testing H o : Subject area preference is independent (not related) of the gender of college student. H a : Subject area preference is not independent (related) of the gender of college student. test statistic -  2 chi-square = where O= observed frequency and E = expected frequency E is the expected count if X and Y are independent. Since the null hypothesis asserts that these factors are independent, we would expect the values to be distributed in proportion to the marginal totals.

13 The expected value is calculated on the assumption that H 0 is true, i.e. gender and subject choice is independent. There are 122 males; we would expect them to be distributed among MS, SS, and H proportionally to the 72, 113, and 115 totals. Thus, the expected cell counts for males are MS Would expect for the females H

14 Typically, the contingency table is written so that it contains all this information. The calculated chi-square is = 2.035 + 0.533 + 0.164 + 1.395 + 0.365 + 0.112 = 4.604

15  2 distribution table and degree of freedom Like the t distribution, the chi-square distribution has (C-1)(R- 1) degree of freedom where C= number of column and R= number of row shown in the two-way table. If the displayed contingency table has 3 columns and 2 rows, the degree of freedom is (3-1)(2-1)=2. if c=4, r=5, then d.f=(4-1)(5-1)=12

16

17 A chi-square statistic was computed for a two- way table having 4 degrees of freedom. The value of the statistic was 9.49. What is the p- value? A.0.005 B. 0.01C. 0.05 A chi-square statistic was computed for a two- way table having 20 degrees of freedom. The value of the statistic was 29.69. What is the p- value? A.0.025 B. 0.05C. 0.075

18 Decision making – CV approach First, look at the level of significance (  ). If it is set as 5% and with df=2, the critical value is 5.99. Second, compare the computed chi-square value with critical value. – The computed  2 is 4.604 and CV=5.99. So computed  2 is smaller than the critical value. – As it lies inside the non-rejection region, we fail to reject the null hypothesis. We conclude that the preference for academic subject areas does not depend on the sex of the respondents at the 5% significance level. Alternatively, we say – sex of the students has no significant influence (relationship) on the preference for subject areas.

19  If the computed  2 is larger than the critical value, e.g.  2 =8.00 and CV=5.99 We reject the null hypothesis. We conclude that the preference for academic subject areas is significantly related to the sex of the respondents at the 5% significance level.

20 Example The value of the χ 2 -test statistic is 5.33. Are the results statistically significant at the 5% significance level if degree of freedom is 2? A.Yes, because 5.33 is greater than the critical value of 3.84. B.Yes, because 5.33 is greater than the critical value of 4.01. C.No, because 5.33 is smaller than the critical value of 5.99. D.No, because 5.33 is smaller than the critical value of 11.07.

21 Decision making – P-value approach P-value approach: The null hypothesis of the independence assumption is to be rejected if the p- value of the Chi-squared test statistics is less than a given significance level α.Chi-squared We conclude that there is a relationship between the variables, i.e. they are not independent (related).

22 If the probability (p-value) of the test statistic is greater than the probability of the alpha error rate (  ), we fail to reject the null hypothesis. We conclude that there is no relationship between the variables, i.e. they are independent. Given the  2 test statistic 4.604, the corresponding p-value at d.f. 2 is around 10%. If we set α=0.05, and given p-value=0.1 (10%) we cannot reject the null hypothesis.

23 In the General Social Survey, respondents were asked what they thought was most important to get ahead: hard work, lucky breaks, or both. What is the null hypothesis for this situation? A.There is a relationship between gender and opinion on what is important to get ahead in the sample. B.There is no relationship between gender and opinion on what is important to get ahead in the sample. C.There is a relationship between gender and opinion on what is important to get ahead in the population. D.There is no relationship between gender and opinion on what is important to get ahead in the population. What is the alternative hypothesis?

24  What is the value of the test statistic?  What are the degrees of freedom for this testing?  At the significance level of 0.05, what conclusion can you make between gender and the opinion to get ahead?

25 A researcher is interested to know ‘with whom it is easiest to make friends?’. 205 first year students from the FSS were randomly selected to participate in the interview. Expected values: Do we have evidence to support that it is easier for men and women to make friends with same sex or opposite sex at 5% significance level? Opposite sexSame sexNo difference Total Female58 (48.79)16 (19.38)63 (68.83)137 Male15 (24.21)13 (9.62)40 (34.17)68 Total7329103205

26 H o : There is no relationship between sex of students and the perception to make friends in the population. H a : There is a relationship between sex of students and the perception to make friends in the population. Test statistic = 8.515 Assume the significance level is 5%, at df=((3-1)(2-1)=2, the critical value is 5.99. As  2 test statistic (8.515) is larger than the critical value, (5.99) we reject the null hypothesis and conclude that there is a relationship between the respondents’ sex and response to the question asked in the population.

27 Suppose a random sample of 650 of the 1 million residents of a city is taken, in which every resident of each of four neighborhoods, A, B, C, and D, is equally likely to be chosen. A null hypothesis says the randomly chosen person's neighborhood of residence is independent of the person's occupational classification, which is either "blue collar", "white collar", or "service". Test the hypothesis at  = 5%.

28 Based on the following table, use the chi-square test to determine whether smoking habit and exercise level of students is independent at the 5% significance level. Smoking habit Exercise FrequentlyNeverSome Heavy713 Never871884 Occasionally1234 Regularly917

29 Exercise 1.Which of the following relationships could be analyzed using a chi-square test? A.The relationship between height (inches) and weight (pounds). B.The relationship between satisfaction with K-12 schools (satisfied or not) and political party affiliation. C.The relationship between gender and amount willing to spend on a stereo system (in dollars). D.The relationship between opinion on gun control and income earned last year (in thousands of dollars).

30 1.A student survey was done to study the relationship between where students live (dormitory, apartment, house, co-op, or parent’s home) and how they usually get to campus (walking, bus, bicycle, car, or subway). What are the degrees of freedom for the chi-square statistic? A. 5B. 16C. 20D. 25 2.A chi-square test involves a set of counts called “expected counts.” What are the expected counts? A.Hypothetical counts that would occur if the alternative hypothesis were true. B.Hypothetical counts that would occur if the null hypothesis were true. C.The actual counts that did occur in the observed data. D.The long-run counts that would be expected if the observed counts are representative.

31 3.A researcher conducted a study on college students to see if there was a link between gender and how often they have cheated on an exam. She asked two questions on a survey: (1) What is your gender? Male ___ Female ___ (2) How many times have you cheated on an exam while in college? Never __ 1 or 2 times ___ 3 or more times ___ a.what is the appropriate null hypothesis? b.What are the degrees of freedom for the test statistic?

32 Using SPSS for  2 test Analyse → descriptive → crosstab→ drag a variable into column and a variable to row → statistic choose chi-square, → cell by default the system gives observed count, choose expected count and see the result. But the expected value is normally not reported. Choose either column or row percentage. This is to report the descriptive statistics in the write up.

33 P-value Compare p-value with  =0.05, make dcesion

34 should we reject the null hypothesis ?


Download ppt "Chi-square Test of Independence. The chi-square test of independence is probably the most frequently used hypothesis test in the social sciences. In this."

Similar presentations


Ads by Google