Presentation is loading. Please wait.

Presentation is loading. Please wait.

E-mail: ajeckert@ucdavis.edu From gene sequences to natural selection - how to find a needle in a haystack Andrew J. Eckert Section of Evolution and Ecology,

Similar presentations


Presentation on theme: "E-mail: ajeckert@ucdavis.edu From gene sequences to natural selection - how to find a needle in a haystack Andrew J. Eckert Section of Evolution and Ecology,"— Presentation transcript:

1 E-mail: ajeckert@ucdavis.edu
From gene sequences to natural selection - how to find a needle in a haystack Andrew J. Eckert Section of Evolution and Ecology, University of California at Davis, Davis, CA USA Ph: (530) Eckert, Neutrality Testing, 5-Aug

2 Eckert, Neutrality Testing, 5-Aug-2008 2

3 Many forest trees grow across extreme environmental gradients
Eckert, Neutrality Testing, 5-Aug

4 Genotype-Phenotype Associations along Environmental Gradients
“Even apparently similar adaptations may be built from genetically different components.” T. Dobzhansky (1960) Eckert, Neutrality Testing, 5-Aug

5 The canonical types of questions
What are the genes underlying adaptive traits in forest trees? What are the average effects of these genes? What fraction of a forest tree genome is under selection? Eckert, Neutrality Testing, 5-Aug

6 Topics Neutrality - What is it? Diversity, divergence and neutrality
Types of data Site-frequency spectrum Statistics and metrics The basic tests Expectations under neutrality Caveats to expectations and testing Empirical examples Available software DnaSAM demonstration Eckert, Neutrality Testing, 5-Aug

7 Neutrality What do we mean when we say “This gene is evolving neutrally”? The sample of genes resides in a population at drift-mutation equilibrium. Drift and mutation are the ONLY forces acting within your sampled populations Mating is at random Offspring are produced following a Poisson distribution with intensity 1. An outgroup diverged T ~ 4Ne generations ago with no subsequent gene flow or demographic fluctuations Eckert, Neutrality Testing, 5-Aug

8 Time scale of neutrality
Long ago Time Recent Phylogenies Pairs of species Within a species Phylogenetics Population genetics Neutral theory of molecular evolution Eckert, Neutrality Testing, 5-Aug

9 Types of Data Protein DNA Isozymes* RAPDs RFLPs AFLPs* SSRs*
DNA sequences* SNP genotypes* Raw data from commonly used technologies Eckert, Neutrality Testing, 5-Aug

10 The site-frequency spectrum (sfs)
Unfolded sfs Folded sfs Eckert, Neutrality Testing, 5-Aug

11 The fundamental population genetic parameter
Diversity within populations is controlled by the parameter : which is the product of the effective population size (Ne) and the per generation mutation rate (u). It is the effective number of new mutants per generation. It is a population-level parameter. If u varies among loci, the genome-wide estimate of  can be approximated by using the geometric average of  across loci where: Eckert, Neutrality Testing, 5-Aug

12 A simple model of the mutational process
Imagine that your DNA sequences are part of an infinite “string of pearls”. Mutations change a pearl from white to black. Because the string is infinite, mutations only “hit” any given pearl once. Therefore, all DNA sequence polymorphisms have only two alleles. This is the infinite sites mutation model. Eckert, Neutrality Testing, 5-Aug

13 Sample-based estimators of  using the sfs
Sensitivity Source low Watterson (1975) intermediate Tajima (1989) singleton Fu and Li (1993) high Fay and Wu (2000) Zeng et al. (2006) Sensitivity = the frequency of observed polymorphisms that makes estimates using a given estimator large relative to the others. Eckert, Neutrality Testing, 5-Aug

14 Statistical performance of estimators
Eckert, Neutrality Testing, 5-Aug

15 Divergence There are many measures of divergence, here I will use those of Nei (1987): Average pairwise divergence Net divergence Net divergence as a function of u and T Eckert, Neutrality Testing, 5-Aug

16 Testing for neutrality
Three major types of tests: Those based on summaries of the sfs Those based directly on the sfs Those based on polymorphism and divergence HKA MKA Eckert, Neutrality Testing, 5-Aug

17 Neutrality testing: Summary statistics
The basic idea: Compare two estimators of diversity (). Under neutrality, different estimators have the same expectation, E() = . Although each estimator is unbiased, they are sensitive to changes in different parts of the site-frequency spectrum. Eckert, Neutrality Testing, 5-Aug

18 Summary statistic based tests from the folded sfs
Tajima’s D Fu and Li’s F* Fu and Li’s D* Estimators of : Eckert, Neutrality Testing, 5-Aug

19 Tajima’s D The math: The mechanics:
When W > , D is negative. W >  occurs when many rare polymorphisms are observed. When W < , D is positive. W <  occurs when many equally frequent polymorphisms are observed. Under neutrality, E(W) = E(), so that D = 0. The biology: Either long-term purifying selection OR a selective sweep may have occurred when too many rare polymorphisms occur in a sample and Tajima’s D is negative. Either you have sampled gene sequences during a selective sweep OR long-term balancing selection has occurred when too many polymorphisms are present at intermediate frequencies within a sample and Tajima’s D is positive. Eckert, Neutrality Testing, 5-Aug

20 The sfs under neutrality and selection
Eckert, Neutrality Testing, 5-Aug

21 Summary statistic based tests from the unfolded sfs
Fu and Li’s F Fu and Li’s D Fay and Wu’s H Zeng et al. E Eckert, Neutrality Testing, 5-Aug

22 Fay and Wu’s H The math: The mechanics:
When L >  , H is negative. L >  occurs when many high frequency polymorphisms are observed. When L < , H is positive. L <  occurs when many equally frequent polymorphisms are observed relative to high frequency variants. Under neutrality, E(L) = E(), so that H = 0. The biology: A selective sweep may have occurred when too many high frequency polymorphisms occur in a sample and Fay and Wu’s H is negative. Either you have sampled gene sequences during a selective sweep OR long-term balancing selection has occurred when too many polymorphisms are present at intermediate frequencies within a sample and Fay and Wu’s H is positive. Eckert, Neutrality Testing, 5-Aug

23 Normalized H vs. H Eckert, Neutrality Testing, 5-Aug

24 Summary Statistic Comparison (frequency of variants) Tajima’s D
intermediate to low Fay and Wu’s H high to intermediate Zeng et al.’s E high to low Eckert, Neutrality Testing, 5-Aug

25 Why are different comparisons useful?
Eckert, Neutrality Testing, 5-Aug

26 So, I have estimated diversity and a neutrality statistic. Now what
So, I have estimated diversity and a neutrality statistic. Now what? What is too extreme of a value for my neutrality statistic? Eckert, Neutrality Testing, 5-Aug

27 Statistical hypothesis testing - A brief overview
1. Formulate the null hypothesis H0 and the alternative hypothesis Ha 2. Identify a test statistic that can be used to assess the truth of H0. 3. Compute the p-value, which is the probability that a test statistic at least as significant as the one observed would be obtained assuming that H0 was true. The smaller the p-value, the stronger the evidence against the null hypothesis. 4. Compare the p-value to an acceptable significance value, . If p ≤ , the observed effect is statistically significant, H0 is ruled out Ha is likely. Eckert, Neutrality Testing, 5-Aug

28 The null and alternative hypotheses
H0: Patterns of diversity are consistent with neutrality. Ha: Patterns of diversity are inconsistent with neutrality. The Wright-Fisher population: Eckert, Neutrality Testing, 5-Aug

29 So, how do you get p-values? A brief introduction to the n-coalescent
Eckert, Neutrality Testing, 5-Aug

30 Coalescence of two gene copies
Eckert, Neutrality Testing, 5-Aug

31 Coalescence of k gene copies
Eckert, Neutrality Testing, 5-Aug

32 P-values for neutrality tests
The probability distribution for most neutrality statistics under H0 is unknown. Therefore, we must simulate H0 using the coalescent (i.e., a Monte Carlo approach). But wait, the rate of coalescence depends upon Ne which we do not know….. Eckert, Neutrality Testing, 5-Aug

33 Coalescent simulations and conditioning on diversity
Several solutions: Condition on S Condition on an estimator of  Typically, W or π Condition on S and  (cf. Simonsen et al. (1995), Genetics 141: ) Now we can get a null distribution! Eckert, Neutrality Testing, 5-Aug

34 How it works Simulate a large number of coalescent trees (say, 10,000). Add mutations to branches on the tree following a Poisson process with intensity ~ . Calculate your statistic of interest for each of the 10,000 simulations. Calculate what proportion of those 10,000 are ≤ your observed statistic. This is your (approximate) p-value with precision to 1/(# simulations). Eckert, Neutrality Testing, 5-Aug

35 A visual interpretation
Eckert, Neutrality Testing, 5-Aug

36 Power of various tests Eckert, Neutrality Testing, 5-Aug

37 Some things to remember
Your null distribution is approximate. Your samples have completely linked polymorphisms within genes and there are no sequencing errors. Your probability values are one-tailed, even though you may be interested primarily in two-tailed tests. DnaSP gives one-tailed p-values corresponding to the left tail. Your null is based on a given value of diversity as if you knew it without error. Your null and alternative hypotheses, if interpreted as drift vs. selection, do not cover the entire range of possibilities. Selection is NOT the only process that leads to the rejection of a Wright-Fisher null hypothesis. Eckert, Neutrality Testing, 5-Aug

38 Extensions to the basic tests
Finite site models with and without recombination and rate variation among sites (Tajima, 1996; Genetics 143: ) Estimators in the presence of sequencing errors (Achaz, 2008; Genetics 179: ) Compound tests of neutrality (Zeng et al., 2007; Mol. Biol. Evol. 24: ) Incorporation of demographic processes and population structure into the null distribution (Nielsen, 2005; Ann. Rev. Genet. 39: ) Eckert, Neutrality Testing, 5-Aug

39 Compound tests of neutrality
Joint tests of neutrality using Tajima’s D, Fay and Wu’s H and a haplotype-based test (EW) Eckert, Neutrality Testing, 5-Aug

40 Effects of demographic processes and population structure
Several demographic and population processes cause similar patterns in the sfs as selection: Statistic Whole genome Gene D < 0 Population growth Severe bottleneck Positive selection D > 0 Population structure Balancing selection Eckert, Neutrality Testing, 5-Aug

41 The structured coalescent
The coalescent process has a HUGE variance under neutrality for things like the TMRCA The variance in the TMRCA is reduced under population growth and the coalescent tree has a predictable topology that is conducive to simulation Eckert, Neutrality Testing, 5-Aug

42 Improved hypothesis tests
Incorporation of demography and subdivision changes the null distribution of the statistics. The coalescent provides a well-studied and mathematically tractable framework for the simulation of samples under (almost) any conditions. The pitfall is that you must again specify the model and parameters as if you new them exactly or at least approximately. ms allows you to specify a range of values supplied in an additional infile. Eckert, Neutrality Testing, 5-Aug

43 Simulation software ms (Hudson)
Recodon (Arenas and Posada) Eckert, Neutrality Testing, 5-Aug

44 Analysis software DnaSP (http://www.ub.es/dnasp/)
Arlequin ( MEGA ( PDA ( PGEToolkit ( Bioperl ( libsequence ( Eckert, Neutrality Testing, 5-Aug

45 DnaSAM – Part 1 DnaSAM is a set of PERL modules that:
Estimates diversity and neutrality for a directory of aligned files Provides a wrapper for for ms Eckert, Neutrality Testing, 5-Aug

46 DnaSAM - Part 2 Command-line interface Summary per locus as it runs
Output Eckert, Neutrality Testing, 5-Aug

47 Direct utilization of the sfs
Diffusion theory can approximate several quantities under Wright-Fisher mating with and without selection These quantities can be used to construct maximum likelihood functions for the expected folded and unfolded sfs and can also incorporate demographic scenarios as well as divergence. They can be used to test for neutrality, but also to estimate parameters associated with selection (e.g., selection intensity) Eckert, Neutrality Testing, 5-Aug

48 Poisson Random Field approach (PRF)
Poisson Random Field (Sawyer and Hartl, 1992; Genetics 132: Extensions to a LRT (Bustamante et al., 2001; Genetics 159: ). Extensions to a hierarchical Bayesian model (Bustamante et al. 2002; Nature 416: ). This is implemented as the mkprf program available at: Fixation Flux Limiting density of polymorphisms Neutral Selection Eckert, Neutrality Testing, 5-Aug

49 Polymorphism-Divergence tests
Eckert, Neutrality Testing, 5-Aug

50 The Hudson, Kreitman and Agaude (HKA) test
The HKA test uses data from two different species. Under neutral evolution, the expected number of differences (D) between two homologous sequences (one from each species) is given by: Similarly, the expected number of polymorphisms within each species are given by: The rationale for the HKA test is that under neutral evolution, the diversity within each species depends upon theta, but that the divergence between species depends on theta AND time (T). Eckert, Neutrality Testing, 5-Aug

51 The HKA Test Statistic The equations from the last slide can be used to construct a 2 statistic with 2L-2 degrees of freedom (L = number of loci, L must be ≥ 2): If 2 is large enough, the null hypothesis of neutral evolution can be rejected. (If E(S) and E(D) are estimated from the same data as that being used in the test, 2 must be bigger than for L = 2 using a critical p-value of 0.05). Eckert, Neutrality Testing, 5-Aug

52 Interpretation of HKA results
A significant HKA result can be due to: Elevated levels of polymorphism relative to divergence This is consistent with balancing selection. Elevated levels of divergence relative to polymorphism This is consistent with positive selection. However, multilocus HKA tests do not tell you which gene is deviant from neutrality and pairwise comparisons among genes can be very difficult to interpret. Eckert, Neutrality Testing, 5-Aug

53 Maximum-likelihood HKA tests
Use maximum-likelihood theory to estimate parameters (polymorphism and divergence) and perform likelihood ratio tests (LRTs; cf. Wright and Charlesworth, 2004; Genetics 168: ). We need the likelihood of polymorphisms and divergence to formulate the entire multilocus likelihood function: Choose candidates genes a priori (n = a), calculate L0 and LA and then perform the LRT using a degrees of freedom Eckert, Neutrality Testing, 5-Aug

54 An example from Douglas-fir
logL 2 P k1 k2 k3 Null -92.15 --- 1 Alternative -85.85 12.60 0.006 0.32 0.58 0.41 Software available from: Eckert, Neutrality Testing, 5-Aug

55 The McDonald-Kreitman (MK) Test
The MK test removes effects of the genealogy by dividing the types of polymorphisms at a given locus into four types: Synonymous/Polymorphic Synonymous/Fixed Nonsynonymous/Polymorphic Nonsynonymous/Fixed Eckert, Neutrality Testing, 5-Aug

56 The MK Table and Tests for Independence
Under neutrality, NF/ SF = NP/ SP Under positive directional selection, NF/ SF > NP/ SP A likelihood ratio test for independence in a contingency table is then used to test the null hypothesis of neutrality. If we assume that SF = SP under neutrality, the test statistic is given as: Eckert, Neutrality Testing, 5-Aug

57 MK integration with PRF
Expectations for synonymous sites Expectations for nonsynonymous sites Eckert, Neutrality Testing, 5-Aug

58 A point for discussion If forest trees maintain a large standing crop of genetic variation, how applicable will these methods be to the detection adaptive variation? All the mathematics, power calculations and inferences that were just discussed are based on selection pushing to fixation new mutants NOT those already in existence. The effects of selection on standing genetic variation are much less pronounced. Applications to climate change? Eckert, Neutrality Testing, 5-Aug


Download ppt "E-mail: ajeckert@ucdavis.edu From gene sequences to natural selection - how to find a needle in a haystack Andrew J. Eckert Section of Evolution and Ecology,"

Similar presentations


Ads by Google