Full modeling versus summarizing gene- tree uncertainty: Method choice and species-tree accuracy L.L. Knowles et al., Molecular Phylogenetics and Evolution.

Slides:



Advertisements
Similar presentations
LG 4 Outline Evolutionary Relationships and Classification
Advertisements

CS 598AGB What simulations can tell us. Questions that simulations cannot answer Simulations are on finite data. Some questions (e.g., whether a method.
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Phylogenetic Trees Lecture 4
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Mutual Information Mathematical Biology Seminar
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
Methods for Phylogenetics and Evolutionary analysis Jianpeng Xu University of Nebraska-Omah a.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Approximate Bayesian Methods in Genetic Data Analysis Mark A. Beaumont, University of Reading,
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Materials and Methods Abstract Conclusions Introduction 1. Korber B, et al. Br Med Bull 2001; 58: Rambaut A, et al. Nat. Rev. Genet. 2004; 5:
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
Processing & Testing Phylogenetic Trees. Rooting.
“Species Trees”. What is the “species tree?” The true tree (when there is one) The population tree The dominant history ????
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
Molecular phylogenetics
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Molecular Clock. Rate of evolution of DNA is constant over time and across lineages Resolve history of species –Timing of events –Relationship of species.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Speciation history inferred from gene trees L. Lacey Knowles Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor MI
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Hung X. Nguyen and Matthew Roughan The University of Adelaide, Australia SAIL: Statistically Accurate Internet Loss Measurements.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogeny GENE why is coalescent theory important for understanding phylogenetics (species trees)? coalescent theory lets us test our assumptions.
Announcements Urban Forestry project starts this week. Go through protocol. We'll be sending you off on your own. Please act responsibly. Peer review of.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
Making Inferences. Sample Size, Sampling Error, and 95% Confidence Intervals Samples: usually necessary (some exceptions) and don’t need to be huge to.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
Patterns of divergent selection from combined DNA barcode and phenotypic data Tim Barraclough, Imperial College London.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Why phylogenetics? Barbara Holland School of Physical Sciences University of Tasmania.
PHYLOGENY AND THE TREE OF LIFE CH 26. I. Phylogenies show evolutionary relationships A. Binomial nomenclature: – Genus + species name Homo sapiens.
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
New methods for estimating species trees from genome-scale data Tandy Warnow The University of Illinois.
Classification. Cell Types Cells come in all types of shapes and sizes. Cell Membrane – cells are surrounded by a thin flexible layer Also known as a.
Classification.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Bayesian Evolutionary Analysis by Sampling Trees (BEAST) LEE KIM-SUNG Environmental Health Institute National Environment Agency.
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Phylogenetic comparative methods Comparative studies (nuisance) Evolutionary studies (objective) Community ecology (lack of alternatives)
Lecture 19 – Species Tree Estimation
Introduction to Bioinformatics Resources for DNA Barcoding
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
True/False questions (3pts*2)
In-Text Art, Ch. 16, p. 316 (1).
Endeavour to reconstruct the characters of each hypothetical ancestor.
Summary and Recommendations
26.5 Molecular Clocks Help Track Evolutionary Time
Chapter 19 Molecular Phylogenetics
CS 394C: Computational Biology Algorithms
Summary and Recommendations
But what if there is a large amount of homoplasy in the data?
Modelling heterogeneity in multi-gene data sets
Presentation transcript:

Full modeling versus summarizing gene- tree uncertainty: Method choice and species-tree accuracy L.L. Knowles et al., Molecular Phylogenetics and Evolution 65 (2012):

Full modeling versus summarizing gene- tree uncertainty: Method choice and species-tree accuracy

Two representative software examples STEM Maximum likelihood based estimation Needs known gene trees Less computationally intensive *BEAST Bayesian inference using full coalescent model Reads multi-locus nucleotide data Technically one of the fastest Bayesian approaches, but still quite costly in computational terms

Two representative software examples STEM Maximum likelihood based estimation Needs known gene trees Less computationally intensive *BEAST Bayesian inference using full coalescent model Reads multi-locus nucleotide data Technically one of the fastest Bayesian approaches, but still quite costly in computational terms

Two representative software examples ML-GT STEM Maximum likelihood based estimation ML- Gene trees computed using GARLI Less computationally intensive *BEAST Bayesian inference using full coalescent model Reads multi-locus nucleotide data Technically one of the fastest Bayesian approaches, but still quite costly in computational terms

Two representative software examples ML-GT STEM Maximum likelihood based estimation ML- Gene trees computed using GARLI Less computationally intensive *BEAST Bayesian inference using full coalescent model Reads multi-locus nucleotide data Technically one of the fastest Bayesian approaches, but still quite costly in computational terms Consensus-GT STEM Maximum likelihood based estimation Consensus gene tree computed using MrBAYES Less computationally intensive, although MrBAYES is slower than GARLI

percent accuracy N10N 0 ML-GT STEM consensus-GT STEM *BEAST Species tree accuracy using the three methods on datasets simulating evolutionary durations of 1N and 10N generations respectively

percent accuracy N10N 0 ML-GT STEM consensus-GT STEM *BEAST The authors' conclusion

percent accuracy N10N 0 ML-GT STEM consensus-GT STEM *BEAST The authors' conclusion The factor having the largest effect on the accuracy of a species-tree estimate is not the method of analysis or sampling design, but is the timing of divergence (sic)

10N 1N sampling effort (individuals:loci)

(B) ML-GT STEM consensus-GT STEM *BEAST 10N 1N

Conclusion On small sample sizes, all methods yield similarly (in)accurate species trees Hence there is no justification for using computationally intensive approaches in these situations.

Conclusion Similarly, all methods yield similarly accurate trees independent of sample size when the analyzed data has evolved down a substantially deep tree. Therefore the less intensive methods would be preferable.

Conclusion When analyzing larger sample sizes containing recent speciation, there is a significant difference in species tree accuracy among the methods. Full coalescent model based inference methods (*BEAST for example) appear to perform best in these situations. In fact, the result on shorter trees rival those on deeper ones in this specific scenario.

In Short: Smaller Sample SizeLarger Sample Size Shorter TreeConsensus GT STEM* BEAST Deeper TreeML-GT STEM

Questions 1. How did running time compare for the two methods? Did the authors make any effort to adjust the degree of sampling time for one or the other? 2. Which one would you use for an analysis like this in the future based on what you’ve read? 3. How would you have improved the authors’ simulated dataset? 4. Why do you/the authors think sampling scheme has the opposite effect on species tree accuracy for late diverging versus recently diverging data? 5. A primary motivation of the paper is the way that gene tree estimation error is treated by the different methods. Did the time of divergence affect the amount of gene tree estimation error in either the maximum likelihood gene trees or the consensus gene trees? 6. What do Knowles et al mean by mutational variance? How is this different than coalescent variance? 7. What is the effect of incomplete gene trees on species tree estimation ? 8. Why did this paper not use summary methods in their analysis ?

Other Questions ?