Testing for a Relationship Between 2 Categorical Variables The Chi-Square Test …
Rel’nship between owning a bike and having a significant other? Rows: Bike Columns: SigOther No Yes All No Yes All Cell Contents -- Count % of Row
Our Hypotheses If there is no relationship, we’d expect the percentages (proportions) in each group to be equal. So: H 0 : There is no relationship between owning a bike and having a significant other. Or, p N = p Y. H A : There is a relationship. Or, p N p Y.
What would the table look like if there was no relationship? Rows: Bike Columns: SigOther No Yes All No Yes All Cell Contents -- Observed Counts 45/92, or 48.9%, would have an SO regardless of owning a bike. So, 0.489(64), or 31.3, non-bikers would have SO. And, 0.489(28), or 13.7, bikers would have SO Expected Counts
Are observed counts very different from expected counts? Calculate (observed - expected) 2 /expected for each of the cells. For first cell: ( ) 2 /32.7 = For second cell: ( ) 2 /31.3 = For third cell: ( ) 2 /14.3 = For fourth cell: ( ) 2 /13.7 = 1.350
Are observed counts very different from expected counts? Add up the resulting quantities to get the value of the “chi-square statistic” for the table. Chi-square statistic = = 3.80 If the chi-square statistic is large, then the observed counts are very different than the counts we’d expect to get if there is no relationship.
The P-value How likely is it that we’d get a chi-square statistic as large as we did if the proportions are equal? The chi-square statistic follows the chi- square distribution with (r-1)(c-1) degrees of freedom, where r and c are the number of rows and columns, respectively, in the table. We’ll let Minitab calculate the P-value.
Rel’nship between owning a bike and having a significant other? Rows: Bike Columns: SigOther No Yes All No Yes All Chi-Square = 3.807, DF = 1, P-Value = Cell Contents -- Count Exp Freq DF= (2-1)(2-1) = 1
Chi-Square Test in Minitab when data are not summarized Select Stat >> Tables >> Cross Tabulation Select two Classification Variables. The first (second) variable you select will be the row (column) variable. Under Display, select what you want shown--perhaps, counts and row percents. Click on box labeled Chi-Square Analysis. Select OK.
Chi-Square Test in Minitab when data are summarized Enter observed counts in table format. Select Stat >> Tables >> Chi-Square Test Specify the columns containing the table. Select OK.
Miscellaneous issues Relationship of chi-square test to Z test Significant relationships not necessarily true relationships. Assumptions
Rel’nship between owning a bike and having a significant other? Success = Having Significant Other Bike X N Sample p No Yes Estimate for p(No) - p(Yes): % CI for p(No) - p(Yes): ( , ) Test for p(No) - p(Yes) = 0 (vs not = 0): Z = P-Value = 0.051
Relationship between Z test and chi-square test Two-tailed Z-test for two proportions (using a pooled estimate of p) and a chi-square test for a 2-by-2 table will give exactly same P- value. Use Z-test for one-tailed tests (to see if one proportion is larger than other.) Use chi-square test for two-tailed tests and for larger than 2-by-2 tables.
Rel’nship between owning bike and having a significant other? Rows: bike Columns: steady No Yes All No Yes All Chi-Square = 0.053, DF = 1, P-Value = Cell Contents -- Count % of Row Using Fall 1998 data, conclude no relationship.
If test suggests relationship exists... Is there a reasonable explanation for a relationship? If not, consider possibility of having made a Type I error. If so, collect data on another random sample and see if new data suggest relationship. If so, start believing it … but still go collect more data …
Ah, those darn assumptions... P-value will only be accurate if you have large enough sample. “Large enough” here means: –no cells have an expected count less than 1 –no more than 20% of the cells have an expected count less than 5 (in a 2-by-2, means no cells). Minitab will print a warning if assumptions are violated.