Presentation is loading. Please wait.

Presentation is loading. Please wait.

16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

Similar presentations


Presentation on theme: "16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human."— Presentation transcript:

1 16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human Genetics, U. of Michigan 2 Bioinformatics Program, U. of Michigan 3 Dept. of Mathematics, U. of Auckland

2 Outline  Species trees vs. gene trees  Consensus tree background  Asymptotic consensus trees  Finite sample consensus trees  Consistency results  Conclusions

3 Gene trees vary across the genome

4 Why? Incomplete lineage sorting, horizontal gene transfer, sampling, etc.

5 Gene tree discordance  From one true species tree, we expect there to be different gene trees at different loci as a result of lineage sorting, independently of problems due to estimation or sampling error.  Gene tree discordance depends especially on branch lengths in the species tree, measured by the number of generations scaled by effective population size, t / (2N).

6

7 Consensus (majority-rule)

8 Asymptotic consensus trees  Consensus trees are usually statistics, functions of data like x-bar.  We consider replacing observed (estimated) gene trees with their theoretical probabilities under coalescence and determining the resulting consensus tree.  Motivation: if there are a large number of independent loci, observed clade proportions should approximate their theoretical probabilities.

9 Types of consensus trees  Strict—only clades that are included in observed trees are in the consensus tree. In the coalescent model, all clades have probability > 0.  Democratic vote—use the gene tree that occurs most frequently.  Majority rule—consensus tree has all clades that were observed in > 50% of trees.  Greedy—sort clades by their proportions. Accept the most frequently observed clades one at a time that are compatible with already accepted clades. Do this until you have a fully resolved tree.  R*—for each set of 3 taxa, find the most commonly occurring triple e.g., (AB)C, (AC)B or (BC)A. Build the tree from the most commonly occurring triples.

10 Unresolved zone for majority-rule and too-greedy zone

11 What about finite samples?  If you sample 10 loci, you could have:  All 10 match the species tree  9 match the species tree, 1 disagrees  8 match the species tree, 2 disagree, etc.  You can consider gene trees as categories and use multinomial probabilities for the probability of your sample  By enumerating all multinomial samples, you can compute the probabilities of every possible consensus tree.

12

13 Are consensus trees inconsistent estimators of species trees?  Theorem 1. Majority-rule asymptotic consensus trees (MACTs) do not have any clades not on the species tree.  Theorem 2. Greedy asymptotic consensus trees (GACTs) can be misleading estimators of species for the 4-taxon asymmetric tree and for any species tree with n > 4 species.  Theorem 3. R* asymptotic consensus trees (RACTs) always match the species tree.

14 Conclusions  Coalescent gene tree probabilities are useful for understanding asymptotic behavior of consensus trees constructed from independent gene trees.  R* consensus trees are consistent and more resolved than majority-rule consensus trees.  Greedy consensus trees can be misleading, but are quicker to approach the species tree than majority-rule or R* when outside of the greedy zone.


Download ppt "16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human."

Similar presentations


Ads by Google