Presentation is loading. Please wait.

Presentation is loading. Please wait.

Do data match an expected ratio

Similar presentations


Presentation on theme: "Do data match an expected ratio"— Presentation transcript:

1

2 Do data match an expected ratio
Chi-square Fisher’s exact test

3 When to Use Chi-Square Test for Homogeneity
When the following conditions are met: For each population, the sampling method is simple random sampling. The variable under study is categorical. If sample data are displayed in a contingency table (Populations x Category levels), the expected frequency count for each cell of the table is at least 5.

4 Ecology example Do biogeographical realms differ in the relative number of endangered bird species? Neotropics 500 endangered, 2000 not endangered Nearctic 200 endangered, 1100 not endangered Prediction / Hypothesis?

5 Contingency table

6 > setwd("~/") > Chi<-read.csv("ChiClass.csv") > chisq.test(Chi$Neotropics,Chi$Nearctic) Pearson's Chi-squared test with Yates' continuity correction data: Chi$Neotropics and Chi$Nearctic X-squared = 0, df = 1, p-value = 1 Warning message: In chisq.test(Chi$Neotropics, Chi$Nearctic) : Chi-squared approximation may be incorrect

7 > ?chisq.test > chisq.test(Chi) Pearson's Chi-squared test with Yates' continuity correction data: Chi X-squared = , df = 1, p-value =

8 Yates's correction is to prevent overestimation of statistical significance for small data

9 When to Use Chi-Square Goodness-of-Fit test?
The classic examples from Mendelian Genetics: 800 yellow and smooth seeds 250 yellow and wrinkled seeds 255 green and smooth seeds 99 green and wrinkled seeds

10 observed = c(800, 250,255,99)        # observed frequencies expected = c(9/16, 3/16,3/16,1/16)      # expected proportions chisq.test(x = observed,            p = expected) chisq.test(x = observed,p = expected) Chi-squared test for given probabilities data: observed X-squared = , df = 3, p-value =

11 > observed = c(1203, 2919, 1678) > expected.prop = c(0.211, 0.497, 0.292) > expected.count = sum(observed)*expected.prop > chi2 = sum((observed- expected.count)^2/ expected.count) > chi2 [1] > pchisq(chi2, df=1, lower.tail=FALSE) [1] > chisq.test(x=observed,p=expected.prop) Chi-squared test for given probabilities data: observed X-squared = , df = 2, p-value =

12 When Fisher’s exact test?

13 “The usual rule of thumb for deciding whether the chi-squared approximation is good enough is that the chi-squared test is not suitable when the expected values in any of the cells of a contingency table are below 5, or below 10 when there is only one degree of freedom (this rule is now known to be overly conservative)”

14 fisher.test(Chi) Fisher's Exact Test for Count Data data: Chi p-value = alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: sample estimates: odds ratio

15 Fisher, another example (for proportions)
A significantly larger proportion of trees in disturbed landscapes than in undisturbed forests contained at least one cavity (Fisher’s exact test, P < 0.001), with cavities in 72% of all 35 dead trees (52% of all trees in plots in disturbed landscapes); and 9% of all 32 living trees (48% of all trees in plots in disturbed landscapes).

16

17 Post-hoc tests But, when you have multiple groups, which ones are different?

18 Ecology example Neotropics 500 endangered, 2000 not endangered
Nearctic 200 endangered, 1100 not endangered Palearctic 50 endangered, 333 not endangered Oriental 600 endangered, 1100 not endangered

19 > Chi2<-read.csv("ChiClass2.csv")
chisq.test(Chi2) Pearson's Chi-squared test data: Chi2 X-squared = 222.1, df = 3, p-value < 2.2e-16

20 Package ‘fifer’ chisq.post.hoc(tbl, test = c("fisher.test"), popsInRows = TRUE, control = c("fdr", "BH", "BY", "bonferroni", "holm", "hochberg", "hommel"), digits = 4, ...)

21 > Chi2 Neotropics Nearctic Palearctic Oriental > chisq.post.hoc(Chi2, test=c"chisq.test") Error: unexpected string constant in "chisq.post.hoc(Chi2,test=c"chisq.test"" > chisq.post.hoc(Chi2,test=c("chisq.test") + ) Adjusted p-values used the fdr method. comparison raw.p adj.p vs

22 > Chi3<-read.csv("ChiClass3.csv")
> chisq.post.hoc(Chi3,test=c("chisq.test")) Adjusted p-values used the fdr method. comparison raw.p adj.p vs vs vs vs vs vs

23 The problem of multiple comparisons

24 Say you have a set of hypotheses that you wish to test simultaneously
Say you have a set of hypotheses that you wish to test simultaneously. The first idea that might come to mind is to test each hypothesis separately, using some level of significance α. At first blush, this doesn’t seem like a bad idea. However, consider a case where you have 20 hypotheses to test, and a significance level of What’s the probability of observing at least one significant result just due to chance? P(at least one significant result) = 1 − P(no significant results) = 1 − (1 − 0.05)^20 ≈ 0.64 So, with 20 tests being considered, we have a 64% chance of observing at least one significant result, even if all of the tests are actually not significant

25 Methods for dealing with multiple testing frequently call for adjusting α in some way, so that the probability of observing at least one significant result due to chance remains below your desired significance level. Famous example: Bonferroni Correction

26 The Bonferroni correction sets the significance cut-off at α/n
The Bonferroni correction sets the significance cut-off at α/n. For example, in the example above, with 20 tests and α = 0.05, you’d only reject a null hypothesis if the p- value is less than The Bonferroni correction tends to be a bit too conservative. To demonstrate this, let’s calculate the probability of observing at least one significant result when using the correction just described: 1P(at least one significant result) = 1 − P(no significant results)= 1 − (1 − )20 Here, we’re just a shade under our desired 0.05 level. We benefit here from assuming that all tests are independent of each other. In practical applications, that is often not the case. Depending on the correlation structure of the tests, the Bonferroni correction could be extremely conservative, leading to a high rate of false negatives.

27 > chisq.post.hoc(Chi3,test=c("chisq.test"),control="bonferroni")
Adjusted p-values used the bonferroni method. comparison raw.p adj.p vs vs vs vs vs vs


Download ppt "Do data match an expected ratio"

Similar presentations


Ads by Google