# Chi-square A very brief intro. Distinctions The distribution The distribution –Chi-square is a probability distribution  A special case of the gamma.

## Presentation on theme: "Chi-square A very brief intro. Distinctions The distribution The distribution –Chi-square is a probability distribution  A special case of the gamma."— Presentation transcript:

Chi-square A very brief intro

Distinctions The distribution The distribution –Chi-square is a probability distribution  A special case of the gamma distribution –The t and F are derived from it  t = ratio of normal to chi-square  F = ratio of two chi-square distributions Goodness of fit tests Goodness of fit tests –You may see it as the test statistic in a variety of procedures to determine if some data ‘fits’ what is theoretically expected Tests of independence Tests of independence –Assesses whether paired observations on two categorical variables are independent of each other  Contingency table

Goodness of Fit Does the data conform to expectations? Does the data conform to expectations? The following are program numbers for 5700 The following are program numbers for 5700 If we expected a balanced distribution, does the data suggest that is true? If we expected a balanced distribution, does the data suggest that is true? Calculation: Sum the squared differences of the observed frequencies and expected frequencies, divided by the expected Calculation: Sum the squared differences of the observed frequencies and expected frequencies, divided by the expected X 2 = 1.4286, df = 4, p-value = 0.84 X 2 = 1.4286, df = 4, p-value = 0.84 Conclusion? Not statistically different from expectations Conclusion? Not statistically different from expectations Note however that we wouldn’t expect a balanced distribution, and could have changed our expected values to conform to a more reasonable estimate based on past entry rates. Note however that we wouldn’t expect a balanced distribution, and could have changed our expected values to conform to a more reasonable estimate based on past entry rates.

Independence Moving beyond the single variable, we can test for the independence of two categorical variables Moving beyond the single variable, we can test for the independence of two categorical variables What do undergrad stat students do with their free time? What do undergrad stat students do with their free time? Updating their Myspace/Facebook or whatever blog thing whose contents will get them fired from some job in the future Talking on cell phone about their drama loudly enough that now total strangers know how the ‘tests’ turned out Texting instead of just calling the person and actually talking to them Staring at Ceiling Males30402010 Females20304010

Is there a relationship between gender and what the stats kids do with their free time? Is there a relationship between gender and what the stats kids do with their free time? Expected = (R i *C j )/N Expected = (R i *C j )/N Example for males Updating: (100*50)/200 = 25 Example for males Updating: (100*50)/200 = 25 Updating their Myspace/Facebook or whatever blog thing whose contents will get them not hired/ fired from some job in the future Talking on cell phone about their drama loudly enough that now total strangers know how the ‘tests’ turned out Texting instead of just calling the person and actually talking to them Staring at the ceiling Total Males30402010100 Females20304010100 50706020200

Table with expectations added df = (R-1)(C-1) df = (R-1)(C-1) Updating their Myspace/Facebook or whatever blog thing whose contents will get them not hired/ fired from some job in the future Talking on cell phone about their drama loudly enough that now total strangers know how the ‘tests’ turned out Texting instead of just calling the person and actually talking to them Staring at the ceiling Total Males (E) 30 (25) 40 (35) 20 (30) 10 (10) 100 Females (E) 20 (25) 30 (35) 40 (30) 10 (10) 100 50706020200

Interpretation X 2 = 10.0952, df = 3, p-value = 0.018 X 2 = 10.0952, df = 3, p-value = 0.018 Reject H 0, there is some relationship between gender and how stats students spend their free time Reject H 0, there is some relationship between gender and how stats students spend their free time

Assumptions Obviously the data itself does not have to be any particular distribution Obviously the data itself does not have to be any particular distribution –Nonparametric Independence Independence –As usual, we assume observations are independent of one another Inclusion of non-occurences Inclusion of non-occurences –The data must include all categories of information –You put ‘Don’t know’ as a response on your survey, suffer the consequences! 1

Other Versions/Extensions For 2 x 2: Yates correction, Fisher’s exact test For 2 x 2: Yates correction, Fisher’s exact test Beyond the two-way setting: Loglinear analysis (covered in your Howell text) Beyond the two-way setting: Loglinear analysis (covered in your Howell text) Categorical X Ordinal outcomes Categorical X Ordinal outcomes –Tests of linear associations –Correlational approach (see Howell 10.4)

Effect Size 2 X 2 2 X 2 d family measures of difference d family measures of difference –Relative risk –Odds ratio r family measures of association r family measures of association –Phi and Cramer’s Phi Measure of agreement Measure of agreement –Kappa

Summary While you may see the chi-square statistic used frequently, the chi-squared tests are increasingly less common While you may see the chi-square statistic used frequently, the chi-squared tests are increasingly less common –The reason is that it is relatively rare that a research question would only entail categorical variables only However the tests are still viable for descriptive and exploratory forays into data, and often utilized as such However the tests are still viable for descriptive and exploratory forays into data, and often utilized as such

Download ppt "Chi-square A very brief intro. Distinctions The distribution The distribution –Chi-square is a probability distribution  A special case of the gamma."

Similar presentations