Download presentation

Presentation is loading. Please wait.

Published byZoie Lipton Modified over 2 years ago

1
**Chi-square, Goodness of fit, and Contingency Tables**

2
**What is the χ2 distribution**

Basically a distribution of squared differences

3
**Useful for detecting categorical differences**

Calculate the χ2 test statistic= (observed-expected)2/expected Degrees of freedom = number of categories -1 Look up χ2 value for that degree of freedom and chosen alpha value. If test statistic > table value, then significant

4
**Two sided test: find the column corresponding to α/2 in the table for upper critical values and**

reject the null hypothesis if the test statistic is greater than the tabled value. Use 1 - α /2 in the table for lower critical values and reject null if the test statistic is less than the tabled value. Upper one-sided test: find column corresponding to α in upper critical values table. If test statistic greater, reject.

5
**Also useful for model fitting**

Assume you have a fit a model to some data and have some residual errors left over. You want to check if residuals are normally distributed. You bin them in a histogram Estimate proportions of residuals in each, compare to actual data

6
**Model Fitting Example Consider a classic genetics experiment.**

The offspring of a cross between the F1 brassicas was 53 dark green and 11 yellow. If the plants are heterozygous for color the ratio of 3 dark green to 1 yellow would be expected. Dark Green Yellow Total Observed numbers (O) 53 11 64 Expected numbers (E) 48 16 O - E 5 -5 (O-E)2 25 (O-E)2 / E 25/48 = 0.52 25/16 = 1.56 2.08

7
**Compound Hypotheses and Directionality**

With multiple categories, compound hypotheses are possible H0 Pr(cat 1) = 0.25, Pr(cat 2) = and Pr(cat 3) = 0.75 HA: one of the above not the case Where there are 2 categories, a “directional alternative” is possible

8
**Directional Alternatives**

Only in the case of “dichotomous variables” – two categories, effectively. Step 1: Check Directionality of trend If not, p-value > 0.5 by necessity If so, proceed to step 2 The P-value is half what it would be if HA were non directional

9
**Directional Alternative Example**

Two football teams records are compared against the average number of wins by an NFL team per year, 9. Team 1 won 14 games this year and several players were caught doping with HGF. Team 2 won 11 games this year and tested clean. Is there evidence that doping increased the number of wins by team 1?

10
**Contingency Tables Use χ2 test statistic as above, but**

Calculate expected values for each element in table from E=(row total)*(column total)/Grand Total; Df =1

11
**2x2 Contingency Tables Can indicate either**

Two independent samples with a dichotomous observed variabled One sample with two dichotomous observed variables Female Male Tot(col) HIV test 9 8 17 No HIV test 52 51 103 Tot (row) 61 59 120

12
**Relation to Independence of data**

You can interpret contingency tables in terms of conditional probabilities Pr(HIV test | female)= 9/61 Pr(female | HIV test) = 9/17 Test becomes H0 : Likelihood of taking and HIV test is independent of sex Female Male Tot(col) HIV test 9 8 17 No HIV test 52 51 103 Tot (row) 61 59 120

13
**Rxk contingency tables**

Same as above, but degrees of freedom = (r-1)*(k-1).

14
**Corrections to the Chi-Squared Test**

It is a requirement that a chi-squared test be applied to discrete data. Counting numbers are appropriate, continuous measurements are not. Assuming continuity in the underlying distribution distorts the p value and may make false positives more likely. Frank Yates proposed a correction to the chi-squared formula. Adding a small negative term to the argument. This tends to increase the p-value, and makes the test more conservative, making false positives less likely. However, the test may now be *too* conservative. Additionally, chi squared test should not be used when the observed values in a cell are <5. It is, at times not inappropriate to pad an empty cell with a small value, though, as one can only assume the result would be more significant with no value there.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google