Presentation is loading. Please wait.

Presentation is loading. Please wait.

Methods in Phylogenetic Inference Chris Castorena Thornton Lab.

Similar presentations


Presentation on theme: "Methods in Phylogenetic Inference Chris Castorena Thornton Lab."— Presentation transcript:

1 Methods in Phylogenetic Inference Chris Castorena Thornton Lab

2 The method used to infer relationships between different species. What is Phylogenetic Inference?

3 ACG TTA TTA ACG TGT TTC ACG TGT TTA ACA GGA TTA Lion Tiger BearTurkey

4 Method Statistical model of evolutionary process Tiger Bear Lion Turkey

5 Method Statistical model of evolutionary process Tiger Bear Lion Turkey Is this the right tree?

6 Method Inferred Tree Evolution Simulator ATTA GCGC ATAA GCGC ATTA CCGC ATAA CCGCATTA CCGC ATTA GCGC ATTA CCGC ATAA CCGC ATTA CCGC ATAA GCGC ATAA CCGC ATTA CCGC True Tree Statistical model of the evolutionary process

7 The Model Internal Branch: When this is long, it’s easy to infer the correct tree, when it’s short, it’s hard to infer the correct tree. Topology A B C D π A π C π G π T P AA P CA P GA P TA P AC P CC P GC P TC P AG P CG P GG P TG P AT P CT P GT P TT Frequency Vector Transition Matrix

8 Current models assume the same process applies to all sites (homogeneity).

9 Problem:  In reality different sites evolve along different evolutionary processes (heterogeneity).  There’s no way of knowing which model a particular site is evolving along.

10 Different sites are likely to be selected for a specific property. Such as charge, hydrophobicity, or reactions with other molecules. These requirements force different sites to have higher or lower levels of certain bases. Why do sites evolve heterogeneously ? High GC contentLow GC content CTTAATATTTGAT ATTAATATTTTAT TAAGATATTGTTT ATTAATATTGAAT GACCCGCCCGCT GGCCTGCGGCCA AATGGGCGGGGG GGTATAGCCCTCG

11 How does this underlying heterogeneity effect the accuracy of our methods?

12 Low GC Content Simulate data to feed the methods ATTA GCGC High GC Content ATAA GCGCATTA CCGC ATAA GCGCATTA CCGC ATAA CCGC ATTA CCGC

13 Analyze the alignments: 1.Maximum Likelihood: Sets parameters at their maximum likelihood value and then chooses the tree with the highest likelihood at that parameter value. 2.Bayesian Analysis: Integrates over parameter values and chooses the tree with the most area under its graph.

14 BayesianMaximum Likelihood True Model Simple Model True Model Simple Model 1)True model(control): Incorporates the heterogeneity… i.e we tell the method which sites have high GC content and which sites don’t. 2)Simple model: Doesn’t incorporate the heterogeneity…. i.e we don’t tell the method which sites have high GC content and which sites don’t.

15 No Heterogeneity 50%

16 Moderate Heterogeneity 20% 80%

17 Extreme Heterogeneity 0% 100%

18 Question: How does heterogeneity effect the accuracy of our methods? Answer: Heterogeneity drastically reduces their accuracy.

19 Interesting observation

20 Question: Does integrating over parameter values in Bayesian analysis induce bias towards the wrong tree?

21 What we Did Feed each method data with no phylogenetic signal (no reason to pick one tree over another). An unbiased method would choose each of the three trees roughly a third of the time. Actual Tree Trees methods have to choose from A B C D B B B AA A CC C D DD

22 Homogeneous Model B B B AAACCC D DD A B C D Actual Tree

23 GC Heterogeneity B B B AAACCC D DD

24 Branch Length Heterogeneity B B B AA A CC C D D D A B C D A B C D 50%

25 Is Bayesian analysis biased? Yes, and this bias increases when there is heterogeneity.

26 Conclusions Unincorporated heterogeneity decreases the accuracy of these methods. Even if this heterogeneity is incorporated, Bayesian methods are biased.

27 Acknowledgements Thornton Lab Spur Program Brian Kolockzkowski Peter O’Day Joe Thornton Beth Roy Jamie Bridgham Geeta Eick Sean Carroll June Keay Jennifer Fox


Download ppt "Methods in Phylogenetic Inference Chris Castorena Thornton Lab."

Similar presentations


Ads by Google