Chi-square Assumptions : 1.Finite # of observations. 2.Observations are independent. 3.Samples collected randomly. 4.Large sample size (>20; >50)
Example: Suppose you caught 5 Bluegill fish and detected two alleles (A1 and A2) and observed that all 5 fish were A1A2 heterozygotes. Calculate allele frequencies and do a χ 2 – test to determine whether the population is in HWE. GenotypeObservedExpected A1A101.25-1.251.56251.25 A1A252.5 6.252.5 A2A201.25-1.251.56251.25 χ25 Conclusion: Reject H 0 at α = 0.05, because calculated χ 2 -value (=5) is more than critical χ 2 - value with 1 d.f. (≈ 3.84) i.e. Bluegill population is not in HWE.
Why is the previous conclusion not reliable? Because it violates the assumption of large sample size. As a rule of thumb, the Chi-square test should not be used when the expected number for any genotype class is less than 5.
Exact Test 1.Calculate the probability of observing N11=0, N12=5, N22=0 under HWE using the multinomial probability equation. 2. Generate all possible permutations of 5 A1 alleles and 5 A2 alleles into 3 genotypes i.e. 10! =3,628,800. 3. Calculate probability of observing each of these samples under HWE using multinomial probability equation. 4. Determine proportion of samples, whose probability is ≤ 0.0313. 5. If proportion (p-value) is less than 0.05, then reject Ho at α = 0.05.
p – value = [Sample with probability ≤ 0.0313] / [Total # of sample] = 3/30 = 0.10 Conclusion: The p-value is more than 0.05, therefore we fail to reject H 0 i.e. The bluegill population is in HWE at α = 0.05
Generation of all possible samples and calculation of probability for each sample is computationally intensive. It will require too much time and is practically impossible for large samples. In practice, exact tests are done by sampling a distribution generated from a Markov Chain (beyond the scope of this course).
Measures of Genetic Variation 1. Heterozygosity (Gene diversity). 2. Number of alleles (Allele diversity). 3. Effective number of alleles. 4. Percentage of polymorphic loci.
1. Heterozygosity (Gene diversity) -Most commonly used measure of genetic variation. -Can be thought of as the probability that a randomly sampled individual will have two different alleles (will be heterozygous) at a given locus -Observed heterozygosity (H O ) = Proportion of heterozygotes in a sample. -Expected heterozygosity(H E ) = Heterozygosity expected under HWE. = Expected homozygosity under HWE = p 1 2 + p 2 2 + p 3 2 + …….+ p n 2 For small sample size(< 50), unbiased H E can be calculated by :
2. Number of Alleles (N a ): - Number of alleles present at a locus in a population. - Also called allele diversity. - Strongly influenced by sample size. 3. Effective number of Alleles (N e ): The number of alleles a population would have if all alleles were at equal frequency 4. Proportion of polymorphic loci (P) : - Not so useful for highly variable loci like Microsatellites. - Locus selection bias
Problem 1. Use GenAlEx to perform the following analyses based on the human SSR data: a.Calculate the genetic variation measures H O, H E, N a, and N e for all loci in all populations. Include the estimated values of these measures for all loci in a population you will be assigned during the lab. What can you conclude about the allele frequencies of the 10 loci by comparing N a to N e ? b.Calculate the average H O and H E across loci for your assigned population. Can you predict anything about the test of HWE based on these values? c.Perform a Chi-square test of HWE for all loci in all populations. Include a summary of the test for your assigned population in the lab report. How do you interpret the results of this test?
Problem 2: Perform an exact test of HWE for all loci and all populations using Arlequin. Include the results for your assigned population in the lab report. a.How do these results compare to those from the Chi-square test and why? Which test do you trust more? b.Why might some populations have significant departures from Hardy-Weinberg expectations, while others do not? c.GRADUATE STUDENTS ONLY: Find an example from the literature of a human population with genotype frequencies that violate Hardy-Weinberg expectations. What is the main cause of this deviation?