# Contingency Tables For Tests of Independence. Multinomials Over Various Categories Thus far the situation where there are multiple outcomes for the qualitative.

## Presentation on theme: "Contingency Tables For Tests of Independence. Multinomials Over Various Categories Thus far the situation where there are multiple outcomes for the qualitative."— Presentation transcript:

Contingency Tables For Tests of Independence

Multinomials Over Various Categories Thus far the situation where there are multiple outcomes for the qualitative variable without regard to anything else has been discussed. Now we discuss whether or not two qualitative variables are related, i.e are they independent?

EXAMPLES (1) Can it be concluded that cola preference and gender are dependent? (2) Can it be concluded that cola preference and age are dependent?

RULE OF 5  2 (Chi-squared) is actually only an approximate distribution for the test statistic. To be a “valid” approximation: ALL e i ’s should be  5 If the rule of 5 is violated, combine some categories so that the condition is met.

COLA PREFERENCE VS. GENDER The 1000 cola drinkers were further classified as to whether they were male or female. COLA MALE FEMALE ROW TOTAL Coke r 1 = 410 Coke 240 170r 1 = 410 Pepsir 2 = 350 Pepsi 200 150r 2 = 350 RCr 3 = 80 RC 50 30r 3 = 80 Shastar 4 = 50 Shasta 35 15r 4 = 50 Joltr 5 = 110 Jolt 75 35r 5 = 110COLUMN TOTALc 1 = 600 c 2 = 400 n = 1000 TOTAL c 1 = 600 c 2 = 400 n = 1000

HYPOTHESIS TEST: Can we Conclude Cola Preference and Gender Are Dependent? H 0 : (NO) Cola preference and gender are independent H A : (YES) Cola preference and gender are dependent  =.05 Reject H 0 if  2 >  2.05,DF –The correct DF = (r-1)(c-1) = (5-1)(2-1) = (4)(1) = 4 where r = # rows and c = # columns Reject H 0 if  2 >  2.05,4 = 9.48773

HOW DO WE GET THE e ij ’s? Let P(A) = Probability a respondent favors Coke Let P(B) = Probability a respondent is a male If H 0 is true: The classifications are independent Thus P(A and B) = P(A)P(B) Best guess for P(A)  410/1000 =.41 Best guess for P(B)  600/1000 =.6 Thus P(A and B)  (.41)(.6) =.246 Expected number (Coke and male)e 11 246 Expected number (Coke and male) =e 11 = 1000(.246) = 246 This can be gotten by r 1 c 1 /n = (410)(600)/1000 =246

CONTIGENCY TABLES Contingency tables are a convenient way of expressing the results when there are two classifications –It is the equivalent of a multinomial table for two classifications We put the e ij ’s in parentheses under (or next to) the f ij ’s in the table; then we calculate:

e ij ’s for Cola vs. Gender Coke/Malee 11 = (410)(600)/1000 = 246 Coke/Female e 12 = (410)(400)/1000 = 164 Pepsi/Male e 21 = (350)(600)/1000 = 210 Pepsi/Female e 22 = (350)(400)/1000 = 140 RC/Male e 31 = ( 80)(600)/1000 = 48 RC/Female e 32 = ( 80)(400)/1000 = 32 Shasta/Male e 41 = ( 50)(600)/1000 = 30 Shasta/Female e 42 = ( 50)(400)/1000 = 20 Jolt/Male e 51 = (110)(600)/1000 = 66 Jolt/Female e 52 = (110)(400)/1000 = 44

Notes on Calculating e’s The column totals may be set in advance or may be random based on the survey. These e ij ’s were all whole numbers -- if they are not DO NOT ROUND TO WHOLE NUMBERS. All these e’s  5 but suppose e 52 were actually = 3 –We might combine the results from Shasta and Jolt colas. –This would reduce the number of rows and hence the degrees of freedom. –e 52 is not less than 5 here, so we do not have to do this.

CONTINGENCY TABLE FOR COLA vs. GENDER MenWomenTotal MenWomenTotal Coke410 Coke 240 170 410 (246) (164) Pepsi350 Pepsi 200 150 350 (210) (140) RC80 RC 50 30 80 ( 48) ( 32) Shasta50 Shasta 35 15 50 ( 30) ( 20) Jolt110 Jolt 75 35 110 ( 66) ( 44) Total 600 400 1000

 2 for Cola vs. Gender  2 = (240-246) 2 /246 + (170-164) 2 /164 + (200-210) 2 /210 + (150-140) 2 /140 + ( 50 - 48) 2 / 48 + ( 30- 32) 2 / 32 + ( 35 - 30) 2 / 30 + ( 15- 20) 2 / 20 + ( 75- 66) 2 / 66 + ( 35- 44) 2 / 44 = 6.92  2 = 6.92 <  2.05,4 = 9.48773 There is not enough evidence to conclude gender and cola preference are dependent.There is not enough evidence to conclude gender and cola preference are dependent.

COLA PREFERENCE vs. AGE Survey results: 60 TOTAL 60 TOTAL Coke 410 Coke 155140 75 40 410 Pepsi 350 Pepsi 155 95 75 25 350 RC80 RC 30 20 15 15 80 Shasta 50 Shasta 20 15 10 5 50 Jolt 110 Jolt 40 30 25 15 110 TOTAL 400300 200100 1000

HYPOTHESIS TEST There are r = 5 rows and c = 4 columns H 0 : (NO) Cola preference and age are independent H 1 : (YES) Cola preference and age are dependent  =.05 Reject H 0 if  2 >  2.05,DF –DF = (r-1)(c-1) = (5-1)(4-1) = (4)(3) = 12 Reject H 0 if  2 >  2.05,12 = 21.0261

Sample e ij ’s e 34e 34 =(Row 3 Total)(Column 4 Total)/(Grand Total) = 8 (80)(100) / 1000 = 8 e 41e 41 =(Row 4 Total)(Column 1 Total)/(Grand Total) = 20 (50) (400) / 1000 = 20

CONTINGENCY TABLE FOR COLA vs. AGE 60Total 60Total Coke410 Coke 155 140 75 40 410 (164) (123) (82) (41) Pepsi350 Pepsi 155 95 75 25 350 (140) (105) (70) (35) RC80 RC 30 20 15 15 80 ( 32) ( 24) (16) ( 8) Shasta50 Shasta 20 15 10 5 50 ( 20) ( 15) (10) ( 5) Jolt110 Jolt 40 30 25 15 110 ( 44) ( 33) (22) (11) Total 400 300 200 100 1000

 2 for Cola vs. Age  2 = (155-164) 2 /164 + (140-123) 2 /123 + (75-82) 2 /82 + (40-41) 2 /41 + … + ( 40 - 44) 2 / 44 + ( 30- 33) 2 / 33 + ( 25- 22) 2 / 22 + ( 15- 11) 2 / 11 = 18.72  2 = 18.72 <  2.05,12 = 21.0261 There is not enough evidence to conclude cola preference and age are dependent.There is not enough evidence to conclude cola preference and age are dependent.

Excel CHITEST gives the p-value for the test =CHITEST(Observed Values, Expected Values) Must first calculate the expected values, e ij ’s See next slide for easy way to calculate these values.

=SUM(B4:C4) Drag to D5:D8 =\$D4*B\$9/\$D\$9 Drag to C13 Then drag B13:C13 to B17:C17 =CHITEST(B4:C8,B13:C17) =SUM(B4:B8) Drag to C9:D9

=SUM(B4:E4) Drag to F5:F8 =SUM(B4:B8) Drag to C9:D9 =\$F4*B\$9/\$F\$9 Drag to E13 Then drag B13:E13 to B17:E17 =CHITEST(B4:E8,B13:E17)

Review Contingency tables allow for comparisons to determine if two different categories are independent Excel -- CHITEST is used to generate the p- values for the chi-squared test Expected Values = (Row Total)(Column Total)/n By hand -- total degrees of freedom = (r-1)(c-1) and the  2 statistic is calculated by:

Download ppt "Contingency Tables For Tests of Independence. Multinomials Over Various Categories Thus far the situation where there are multiple outcomes for the qualitative."

Similar presentations