Speciation history inferred from gene trees L. Lacey Knowles Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor MI

Slides:



Advertisements
Similar presentations
The multispecies coalescent: implications for inferring species trees
Advertisements

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
The Coalescent Theory And coalescent- based population genetics programs.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Introduction to Phylogenies
Amorphophallus titanum Largest unbranched inflorescence in the world Monecious and protogynous Carrion flower (fly/beetle pollinated) Indigenous to the.
Sampling distributions of alleles under models of neutral evolution.
Discordance due to gene flow or horizontal gene transfer.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Lecture 23: Introduction to Coalescence April 7, 2014.
Phylogenetic reconstruction
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Review of cladistic technique Shared derived (apomorphic) traits are useful in understanding evolutionary relationships Shared primitive (plesiomorphic)
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Salit Kark Department of Evolution, Systematics and Ecology The Silberman Institute of Life Sciences The Hebrew University of Jerusalem Conservation Biology.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Gene Trees and Species Trees: Lessons from morning glories Lauren A. Eserman & Richard E. Miller Department of Biological Sciences Southeastern Louisiana.
“Species Trees”. What is the “species tree?” The true tree (when there is one) The population tree The dominant history ????
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Tree Inference Methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Quantifying uncertainty in species discovery with approximate Bayesian computation (ABC): single samples and recent radiations Mike HickersonUniversity.
Population assignment likelihoods in a phylogenetic and demographic model. Jody Hey Rutgers University.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Phylogeny GENE why is coalescent theory important for understanding phylogenetics (species trees)? coalescent theory lets us test our assumptions.
A brief introduction to phylogenetics
Simon Myers, Gil McVean Department of Statistics, Oxford Recombination and genetic variation – models and inference.
Genomic diversity and differentiation heading toward exam 3.
Patterns of divergent selection from combined DNA barcode and phenotypic data Tim Barraclough, Imperial College London.
Gene tree discordance and multi-species coalescent models Noah Rosenberg December 21, 2007 James Degnan Randa Tao David Bryant Mike DeGiorgio.
Parsimony is Computationally Intensive
Coalescent Models for Genetic Demography
Lecture 17: Phylogenetics and Phylogeography
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Figure 5.1 Giant panda (Ailuropoda melanoleuca)
Why phylogenetics? Barbara Holland School of Physical Sciences University of Tasmania.
Populations: defining and identifying. Two major paradigms for defining populations Ecological paradigm A group of individuals of the same species that.
Estimating genetic diversity (  within populations  =  a function of the number of polymorphic sites in a population (S) “Watterson’s theta”
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Integrative taxonomy Gustav Paulay Florida Museum of Natural History University of Florida.
Amorphophallus titanum
Modelling evolution Gil McVean Department of Statistics TC A G.
Species Tree Workshop January 14, 2012 Practice with BEST Please download MrBayes 3.2 for either Windows, Macintos, or UNIX from
Full modeling versus summarizing gene- tree uncertainty: Method choice and species-tree accuracy L.L. Knowles et al., Molecular Phylogenetics and Evolution.
Robert Page Doctoral Student in Dr. Voss’ Lab Population Genetics.
Phylogenetic comparative methods Comparative studies (nuisance) Evolutionary studies (objective) Community ecology (lack of alternatives)
Lecture 19 – Species Tree Estimation
IMa2(Isolation with Migration)
Endeavour to reconstruct the characters of each hypothetical ancestor.
Summary and Recommendations
Parsimony is Computationally Intensive
Coupling Genetic and Ecological-Niche Models to Examine How Past Population Distributions Contribute to Divergence  L. Lacey Knowles, Bryan C. Carstens,
The Most General Markov Substitution Model on an Unrooted Tree
Summary and Recommendations
But what if there is a large amount of homoplasy in the data?
Bruce Rannala, Jeff P. Reeve  The American Journal of Human Genetics 
Presentation transcript:

Speciation history inferred from gene trees L. Lacey Knowles Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor MI

Emphasis on multilocus data in phylogenetics and phylogeography… The good The bad The ugly Utility of single locus data for inferences about speciation history??

Estimating population genetic parameters relevant to the process of species divergence 11 T Present AA 22 m

speciation T Was speciation promoted by displacements into glacial refugia or recolonization of sky islands during interglacials? Was diversification inhibited or promoted during the Pleistocene? accurate & precise estimates of T is essential to evaluating when, and therefore the geographic setting, of species divergence Parameterized model for making inferences about the divergence process

36 M. oregonensis 23 M. montanus Divergence M. oregonensis and M. montanus from the Rocky Mountains Carstens & Knowles 2007, Mol. Ecol. 16: anonymous nuclear loci 1 mitochondrial locus

11 T Present m AA 22 coalescent framework and multilocus versus single locus data set 4.9 x 10 5 to 2.0 x 10 6 estimate from average mtDNA genetic distance: *same mutation rate used in the different approaches divergence of gene lineages within the ancestral species

Assumed species tree of Poephila finches Jennings & Edwards (2005) Evolution hecki acuticauda cincta Long-tailed Finch Black-throated Finch t ahc -t ah t ah Australia  ah  ahc Identified role of geographic barriers in a Pleistocene divergence of the grass finches Bayes Markov chain Monte Carlo (MCMC) method (Yang and Rannala) - multiple independent loci - estimates ancestral  (present  also) - estimates population divergence times - uses branch length information - accounts for uncertainty in gene trees Assumptions: -“know” the species tree - random mating - no gene flow after population divergence - free recombination among loci (not within) Parameterized model for making inferences about the divergence process Analysis of 30 anonymous nuclear loci

Jennings & Edwards (2005) Evolution hecki acuticauda cincta t ahc -t ah t ah  ah  ahc Prior and posterior probability distributions (grey and black lines refer to analyses based on two different priors) Increasing variance with decreasing number of loci

Estimating population genetic parameters relevant to the process of species divergence 11 T Present AA 22 m The good The bad The ugly

Estimating the history (order) of divergence events (i.e., the species tree) for recently derived taxa Effects of sampling scheme: contrast between sequencing single representatives per species versus multiple individuals per species

gene tree species tree Gene trees will not always match the species tree deep coalescence Maddison 1997

While there is a distribution of possible gene trees for a given species tree, the probabilities of each gene tree differs low P(G tree |S tree ) high P(G tree |S tree ) Degnan & Salter (2005) Evolution 5 taxa 105 possible gene tree topologies * The shape of this distribution will differ depending on the shape of the species tree

Inferred history of species divergence differs among loci Jennings & Edwards (2005) Evolution Gene trees from 30 anonymous markers with single individual sequenced per species

Estimating the history (order) of divergence events (i.e., the species tree) for recently derived taxa Gene tree from one locus with 9 individuals sequenced in each of 8 different species

Multilocus data concatenation “THE history” Arbitrary criteria History of divergence based on single nucleotide difference What is the true species tree?

Recently developed approaches for estimating the species tree (explicitly consider the process of gene lineage coalescence in the estimation of the history of species divergence) Maddison & Knowles 2006 Edwards et al Liu & Pearl 2007 gene tree species tree Gene tree from one locus with multiple individuals sequenced per species discord Extract the historical signal of species divergence, despite discord between the gene tree and species tree

Goal: estimate the species tree directly (as opposed to estimating a gene tree and equating that gene tree with the history of the species) species tree species A

gene tree species tree discord (1) minimize the number of deep coalescences (2) shallowest divergence between species Considers the process of lineage sorting, but the actual probabilities of incomplete lineage sorting are not quantified using a stochastic model STEM and BEST: Likelihood and Bayesian approaches that incorporate stochastic models of both nucleotide substitution and lineage sorting processes Can the history of species divergence be recovered from a single gene tree:

simulated species trees simulated sequences simulated gene trees shallowest divergence approach minimize the number of deep coalescences reconstructed gene trees reconstructed species trees infer species tree: Maddison & Knowles 2006

simulated species trees inferred species trees accuracy assessment number of partitions of the species in common between original and inferred species trees (max = 5 for the 8 species trees)

500 replicate species trees of 8 species each Goals:  Examine a reasonable spectrum of topologies and branch lengths simulated species trees (500 species trees were simulated rather than choosing a single species tree & assessing how well it can be reconstructed with many simulation replicates) t = 100,000 (i.e., 1N e ); 500 replicate species trees t = 1,000,000 (i.e., 10N e ); 500 replicate species trees (*topologies of the two sets of trees are identical)  Determine how the extent of incomplete lineage sorting affects the ability to reconstruct species histories Maddison & Knowles 2006

(1, 3, 9 or 27 gene trees representing unlinked loci simulated independently with either 1, 3, 9 or 27 gene sequences simulated for each locus per species) simulated species trees simulated gene trees neutral coalescence (N e = 100,000)  Increasing total sampling effort per species (either 1, 3, 9 or 27 sequences per species)  Increasing the number of individuals per locus versus the number of loci per species for a given sampling effort Accuracy affected by: Maddison & Knowles 2006

genecopies per locus 1N e 10N e Number of deep coalescences Lots of discord (i.e.,our simulated data should well reflect the challenges faced by reconstructing evolutionary relationships near the species/population level) Maddison & Knowles 2006

b. total tree depth of 10 N e locusa. total tree depth of 1N e 1 Deep Coalescents Average proportion of correct partitions (those in the inferred tree matching the true tree) gene trees retain some signal of phylogenetic history despite significant discord with species tree * Average accuracy greater as expected locus is reasonably successful, given that the shared partition measure is sensitive to minor changes in tree structure (approximately equivalent to a single terminal taxon being out of place) Shallowest Divergence Deep Coalescents Shallowest Divergence Deep Coalescents Shallowest Divergence Deep Coalescents Shallowest Divergence gene copies gene copy Deep Coalescents Shallowest Divergence Deep Coalescents Shallowest Divergence Deep Coalescents Shallowest Divergence Deep Coalescents gene copies Maddison & Knowles 2006

gene tree species tree * * * * * * Estimating the history (order) of divergence events (i.e., the species tree) for recently derived taxa Gene tree from one locus with multiple individuals sequenced per species and very simple approach The good The bad The ugly What would happen if more loci were considered?

proportion of trees random 1 individual 3 individuals 9 individuals 27 individuals proportion of trees tree accuracy ( number of shared partitions with ‘true’ tree) random 1 locus 3 loci 9 loci 27 loci Frequency distribution of species tree accuracy with increasing number of loci Frequency distribution of species tree accuracy with increasing number of individuals Similar accuracy for a given sampling effort if sample multiple individuals compared to loci for recent divergence (t = 1N e ) The curve marked “random” shows the expected distribution of the accuracy measure in comparing two randomly simulated trees

Wayne Maddison Bryan Carstens, (former postdoc) support NSF (DEB ) & the University of Michigan Acknowledgements: