Presentation is loading. Please wait.

Presentation is loading. Please wait.

Processing & Testing Phylogenetic Trees. Rooting.

Similar presentations


Presentation on theme: "Processing & Testing Phylogenetic Trees. Rooting."— Presentation transcript:

1 Processing & Testing Phylogenetic Trees

2 Rooting

3

4

5 Rooting Outgroup Rooting 1. Outgroup Rooting: Based on external Information. Midpoint Rooting 2. Midpoint Rooting: Direct a posteriori use of the ultrametricity assumption. Largest-Genetic-Variability-Group Rooting 3. Largest-Genetic-Variability-Group Rooting: Indirect a posteriori use of the ultrametricity assumption.

6 Rooting with outgroup plant fungus animal Unrooted tree Are fungi relatives of animals or plants?

7 Rooting with outgroup plant fungus animal Unrooted tree Add an outgroup, e.g., a bacterium.

8 Rooted tree Rooting with outgroup plant fungus animal bacterium root animal fungus Unrooted tree plant Monophyletic group Monophyletic group bacterial outgroup

9 Midpoint rooting

10 Largest variation = Most ancient

11 Species Divergence Times If we know T 1 and the rate of evolution, then we can infer T 2. If we know T 2 and the rate of evolution, then we can infer T 1.

12

13 If T 1 is known

14 If T 2 is known

15 Dating divergence events requires paleontological calibrations. This is a complicated problem.

16 Topological comparisons Topological comparisons entail measuring the similarity or dissimilarity among tree topologies. The need to compare topologies may arise when dealing with trees that have been inferred from analyses of different sets of data or from different types of analysis of the same data set. When two trees derived from different data sets or different methodologies are identical, they are said to be congruent. Congruence can sometimes be partial, i.e., limited to some parts of the trees, other parts being incongruent.

17 Penny and Hendy's topological distance (d T ) A commonly used measure of dissimilarity between two tree topologies. The measure is based on tree partitioning. d T = 2c c = the number of partitions resulting in different divisions of the OTUs in the two tree topologies under consideration.

18 Trees inferred from the analysis of a particular data set are called fundamental trees, i.e., they summarize the phylogenetic information in a data set. Consensus trees are trees that summarize the phylogenetic information in a set of fundamental trees.

19 strict consensus tree In a strict consensus tree, all conflicting branching patterns are collapsed into multifurcations. majority-rule consensus trees In a X% majority-rule consensus trees, a branching pattern that occurs with a frequency of X% or more is adopted. When X = 100%, the majority-rule consensus tree will be identical with the strict consensus tree.

20

21 A tree is an evolutionary hypothesis

22 Q: How can we ascertain that the methodology we have used yields reliable results? A: We can test the methodology on a phylogeny that is known for certain to be true, and compare the inferred phylogeny with the true phylogeny.

23 Caminalcules are a group of artificial organisms (belonging to the genus Caminalculus) that were invented by Dr. Joseph H. Camin from the University of Kansas. Interested in how taxonomists group species, he designed these creatures to show an evolutionary pattern of divergence and diversification in morphology. There are 29 recent “species” of Caminalculus and 48 fossil forms. The Caminalcules first appeared in print in the journal Systematic Zoology (now Systematic Biology) in 1983, four years after Camin's death in 1979. The first four papers on Caminalcules were written by Robert R. Sokal. Joseph H. Camin (1922–1979)

24 Extant Extinct

25 Assessing tree reliability Phylogenetic reconstruction is a problem of statistical inference. One must assess the reliability of the inferred phylogeny and its component parts. Questions: (1) how reliable is the tree? (2) which parts of the tree are reliable? (3) is this tree significantly better than another one?

26 Bootstrapping A statistical technique that uses intensive random resampling of data to estimate a statistic whose underlying distribution is unknown.A statistical technique that uses intensive random resampling of data to estimate a statistic whose underlying distribution is unknown.

27 Characters are resampled with replacement to create many bootstrap replicate data sets (pseudosamples)Characters are resampled with replacement to create many bootstrap replicate data sets (pseudosamples) Each bootstrap replicate data set is analyzedEach bootstrap replicate data set is analyzed Frequency of occurrence of a group (bootstrap proportions) is a measure of support for the groupFrequency of occurrence of a group (bootstrap proportions) is a measure of support for the group Bootstrapping

28

29 Bootstrapping - an example Ciliate SSUrDNA - parsimony bootstrap 123456789 Freq -----------------.**...... 100.00...**.... 100.00.....**.. 100.00...****.. 100.00...****** 95.50.......** 23.33...****.* 11.83...*****. 3.83.*******. 2.50.**....*. 1.00.**.....* 1.00 Partition Table Ochromonas (1) Symbiodinium (2) Prorocentrum (3) Euplotes (8) Tetrahymena (9) Loxodes (4) Tracheloraphis (5) Spirostomum (6) Gruberia (7) 100 96 23 100

30 Reduction of a phylogenetic tree by the collapsing of internal branches associated with bootstrap values that are lower than a critical value (C). (a) Gene tree for  -tubulin (b) C = 50% (c) C = 90%

31 All these tests use the null hypothesis that the differences between two trees (A and B) are no greater than expected by chance (from the sampling error).All these tests use the null hypothesis that the differences between two trees (A and B) are no greater than expected by chance (from the sampling error). Tests for two competing trees

32 Likelihood Ratio Test Likelihood of Hypothesis 1 = L 1Likelihood of Hypothesis 1 = L 1 Likelihood of Hypothesis 2 = L 2Likelihood of Hypothesis 2 = L 2  = 2(ln L 1 – ln L 2 )  = 2(ln L 1 – ln L 2 ) Compare  to  2 distribution or to a simulated distribution.Compare  to  2 distribution or to a simulated distribution.

33 Reliability of Phylogenetic Methods Phylogenetic methods can also be evaluated in terms of their general performance, particularly their:Phylogenetic methods can also be evaluated in terms of their general performance, particularly their: consistency - approach the truth with more data efficiency - how quickly can they handle how much data robustness - how sensitive to violations of assumptions

34 Problems with long branches With long branches most methods may yield erroneous trees. For example, the maximum-parsimony method tends to cluster long branches together. This phenomenon is called long-branch attraction or the Felsenstein zone

35

36


Download ppt "Processing & Testing Phylogenetic Trees. Rooting."

Similar presentations


Ads by Google