# Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics.

## Presentation on theme: "Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics."— Presentation transcript:

Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics

Purpose To measure discontinuous categorical/binned data in which a number of subjects fall into categories We want to compare our observed data to what we expect to see. Due to chance? Due to association? When can we use the Chi-Square Test? ◦ Testing outcome of Mendelian Crosses, Testing Independence – Is one factor associated with another?, Testing a population for expected proportions

Assumptions: 1 or more categories Independent observations A sample size of at least 10 Random sampling All observations must be used For the test to be accurate, the expected frequency should be at least 5

Conducting Chi-Square Analysis 1) Make a hypothesis based on your basic biological question 2) Determine the expected frequencies 3) Create a table with observed frequencies, expected frequencies, and chi-square values using the formula: (O-E) 2 E 4) Find the degrees of freedom: (c-1)(r-1) 5) Find the chi-square statistic in the Chi-Square Distribution table 6) If chi-square statistic > your calculated chi-square value, you do not reject your null hypothesis and vice versa.

Example 1: Testing for Proportions Leaf Cutter Ants Carpenter Ants Black AntsTotal Observed25181760 Expected20 60 O-E5-2-30 (O-E) 2 E 1.250.20.45 χ 2 = 1.90 H O : Horned lizards eat equal amounts of leaf cutter, carpenter and black ants. H A : Horned lizards eat more amounts of one species of ants than the others. χ 2 = Sum of all: (O-E) 2 E Calculate degrees of freedom: (c-1)(r-1) = 3-1 = 2 Under a critical value of your choice (e.g. α = 0.05 or 95% confidence), look up Chi-square statistic on a Chi-square distribution table.

Example 1: Testing for Proportions χ 2 α=0.05 = 5.991

Example 1: Testing for Proportions Chi-square statistic: χ 2 = 5.991 Our calculated value: χ 2 = 1.90 *If chi-square statistic > your calculated value, then you do not reject your null hypothesis. There is a significant difference that is not due to chance. 5.991 > 1.90 ∴ We do not reject our null hypothesis. Leaf Cutter Ants Carpenter Ants Black AntsTotal Observed25181760 Expected20 60 O-E5-2-30 (O-E) 2 E 1.250.20.45 χ 2 = 1.90

SAS: Example 1 Included to format the table Define your data Indicate what your want in your output

SAS: Example 1

SAS: What does the p-value mean? “The exact p-value for a nondirectional test is the sum of probabilities for the table having a test statistic greater than or equal to the value of the observed test statistic.” High p-value: High probability that test statistic > observed test statistic. Do not reject null hypothesis. Low p-value: Low probability that test statistic > observed test statistic. Reject null hypothesis.

SAS: Example 1 High probability that Chi-Square statistic > our calculated chi-square statistic. We do not reject our null hypothesis.

SAS: Example 1

Example 2: Testing Association c cellchi2 = displays how much each cell contributes to the overall chi-squared value no col = do not display totals of column no row = do not display totals of rows chi sq = display chi square statistics H O : Gender and eye colour are not associated with each other. H A : Gender and eye colour are associated with each other.

Example 2: More SAS Examples

(2-1)(3-1) = 1*2 = 2 High probability that Chi-Square statistic > our calculated chi-square statistic. (78.25%) We do not reject our null hypothesis.

Example 2: More SAS Examples If there was an association, can check which interactions describe association by looking at how much each cell contributes to the overall Chi-square value.

Limitations No categories should be less than 1 No more than 1/5 of the expected categories should be less than 5 ◦ To correct for this, can collect larger samples or combine your data for the smaller expected categories until their combined value is 5 or more Yates Correction* ◦ When there is only 1 degree of freedom, regular chi- test should not be used ◦ Apply the Yates correction by subtracting 0.5 from the absolute value of each calculated O-E term, then continue as usual with the new corrected values

What do these mean?

Likelihood Ratio Chi Square

Mantel-Haenszel Chi-Square Test Q MH = (n-1)r 2 r 2 is the Pearson correlation coefficient (which also measures the linear association between row and column) ◦ http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/def ault/viewer.htm#procstat_freq_a0000000659.htm Tests alternative hypothesis that there is a linear association between the row and column variable Follows a Chi-square distribution with 1 degree of freedom

Phi Coefficient

Contigency Coefficient

Cramer’s V

Yates & 2 x 2 Contingency Tables H O : Heart Disease is not associated with cholesterol levels. H A : Heart Disease is more likely in patients with a high cholesterol diet. Calculate degrees of freedom: (c-1)(r-1) = 1*1 = 1 We need to use the YATES CORRECTION High Cholesterol Low Cholesterol Total Heart Disease15722 Expected 12.659.3522 Chi-Square 0.440.591.03 No Heart Disease81018 Expected 10.357.6518 Chi-Square 0.530.721.25 TOTAL231740 Chi-Square Total2.28

Yates & 2 x 2 Contingency Tables H O : Heart Disease is not associated with cholesterol levels. H A : Heart Disease is more likely in patients with a high cholesterol diet. High Cholesterol Low Cholesterol Total Heart Disease15722 Expected 12.659.3522 Chi-Square 0.270.370.64 No Heart Disease81018 Expected 10.357.6518 Chi-Square 0.330.450.78 TOTAL231740 Chi-Square Total1.42 (|15-12.65| - 0.5) 2 12.65 = 0.27

Example 1: Testing for Proportions χ 2 α=0.05 = 3.841

Yates & 2 x 2 Contingency Tables H O : Heart Disease is not associated with cholesterol levels. H A : Heart Disease is more likely in patients with a high cholesterol diet. 3.841 > 1.42 ∴ We do not reject our null hypothesis. High Cholesterol Low Cholesterol Total Heart Disease15722 Expected 12.659.3522 Chi-Square 0.270.370.64 No Heart Disease81018 Expected 10.357.6518 Chi-Square 0.330.450.78 TOTAL231740 Chi-Square Total1.42

Fisher’s Exact Test Left: Use when the alternative to independence is negative association between the variables. These observations tend to lie in lower left and upper right cells of the table. Small p-value = Likely negative association. Right: Use this one-sided test when the alternative to independence is positive association between the variables. These observations tend to lie in upper left and lower right cells or the table. Small p-value = Likely positive association. Two-Tail: Use this when there is no prior alternative.

Yates & 2 x 2 Contingency Tables

H O : Heart Disease is not associated with cholesterol levels. H A : Heart Disease is more likely in patients with a high cholesterol diet.

Conclusion The Chi-square test is important in testing the association between variables and/or checking if one’s expected proportions meet the reality of one’s experiment There are multiple chi-square tests, each catered to a specific sample size, degrees of freedom, and number of categories We can use SAS to conduct Chi-square tests on our data by utilizing the command proc freq

References Chi-Square Test Descriptions: http://www.enviroliteracy.org/pdf/materials/1210.pdf http://129.123.92.202/biol1020/Statistics/Appendix%206 %20%20The%20Chi-Square%20TEst.pdf Ozdemir T and Eyduran E. 2005. Comparison of chi-square and likelihood ratio chi-square tests: power of test. Journal of Applied Sciences Research. 1(2):242-244. SAS Support website: http://www.sas.com/index.html “FREQ procedure” YouTube Chi-square SAS Tutorial (user: mbate001): http://www.youtube.com/watch?v=ACbQ8FJTq7k