Download presentation

Published byJulia Chandler Modified over 4 years ago

1
**The multispecies coalescent: implications for inferring species trees**

James Degnan 21 February 2008

2
**Outline 1. Background --gene trees vs. species trees**

--coalescence and incomplete lineage sorting 2. Inferring species trees --Concatenation --Consensus Trees 3. Conclusions

3
**Population Genetics and Phylogenetics**

Population genetics: traditionally used to analyze single populations. Phylogenetics: What is the best way to infer relationships between populations/species? Graphic by Mark A. Klinger, Carnegie Museum of Natural History, Pittsburgh

4
**Desirable properties of species tree estimators**

1. Statistical consistency (sample size = # of genes) 2. Efficiency 3. Robustness to violations in assumptions

5
**Bridging the popgen/phylo divide**

“Incorporation of explicit models of lineage sorting will be needed for continued development of phylogenetic inference near the species level.” –Maddison and Knowles (2006). “Closer integration of population-genetic factors in phylogenetics, including further insights into gene-tree/species tree, and horizontal gene transfer.” --from Mike Steel’s website, My pick for five directions in phylogenetics that will grow in the next five years (2006).

6
**The coalescent process**

Past Present

7
One population

8
**Multiple populations/species**

Present Past

9
**Gene tree in a species tree**

10
**Model species tree with gene tree**

A B C D The gene tree is a random variable. The gene tree distribution is parameterized by the species tree topology and internal branch lengths.

11
**How can we compute probabilities of gene trees given species trees?**

-Under a coalescent model, probabilities for gene trees with three species were derived by Nei (1987): 1-(2/3)e-T -Probabilities for the gene tree to match the species tree topology for 4 and 5 species given by Pamilo and Nei (1988). -All 30 species tree/gene tree combinations for 4 species given by Rosenberg (2002). -General case solved by Degnan and Salter (2005) and implemented by program COAL. Also allows individuals sampled in species i.

12
**This coalescent history: (1,3,3) **

Definition: a coalescent history is a list of the populations in which each coalescent event occurs. A B C D This coalescent history: (1,3,3) Other coalescent histories: (2,3,3), (3,3,3)

13
**Gene tree probabilities**

14
**Gene tree probabilities**

combinatorial enumeration, complexity only known in special cases probability coalescences are consistent with g branch length u coalesce into v internal branches of S

15
**Data from Ebersberger et al. 2007. Mol. Biol. Evol. 24:2266-2276.**

Theoretical distribution based on parameters from Rannala and Yang, Genetics 164: 1.2 4.2 t/N =

16
y x

37
Definition: a gene tree which is more probable than the gene tree matching the species tree is called an anomalous gene tree (Degnan and Rosenberg, 2006). Theorem 1. For the asymmetric species tree topology with four species and for any species tree topology with more than four species, there exist branch lengths such that at least one gene tree is anomalous (Degnan and Rosenberg, 2006).

38
**Is species tree inference consistent in this setting?**

1. Concatenation? 2. Consensus?

39
**Species Tree inference—concatenation**

Species Trees are often estimated by concatenating several gene sequences and analyzing as one (data from Chen and Li, 2001). Gene 1 Human CTTGAATAATTTTTAC Chimp CTTCAATAATTTTTAC Gorilla TTTGAATAATTTTTAC Orang CTTGAATAATTTTTAT Gene 2 TAGAGTTTCCTTGTGGTG TAGAGTTTCCTTGTGGTA CAGAGTTTCCTTGTGGTC Gene 3 CGGTTT TGGTTT CRGTTT

40
**Concatenation and gene tree discordance**

How does concatenation perform when sequences are generated from different topologies? CGGTTT TGGTTA TAGTTA CGATTA TGATTA TAATTT TGAATT TGCTAT CCCTAT Simulated gene trees Species tree: y = 1.0, x = 0.05 y x CGGTTT TGGTTA TAGTTA CGATTA TGATTA TAATTT TGAATT TGCTAT CCCTAT concatenated sequence

41
**Trees inferred from concatenated sequences (Kubatko and Degnan, 2007)**

y = 1.0, x = 0.05 Number of genes

42
**Is species tree inference consistent in this setting?**

1. Concatenation? No. 2. Consensus?

43
**Consensus (majority-rule)**

44
**Types of consensus trees**

Majority rule—consensus tree has all clades that were observed in > 50% of trees. Greedy—sort clades by their proportions. Accept the most frequently observed clades one at a time that are compatible with already accepted clades. Do this until you have a fully resolved tree. R*—for each set of 3 taxa, find the most commonly occurring triple e.g., (AB)C, (AC)B or (BC)A. Build the tree from the most commonly occurring triple. (AB)D, (CD)B are two rooted triples

45
**Asymptotic consensus trees**

Consensus trees are usually statistics, functions of data like x-bar. Definition: an asymptotic consensus tree is the tree that is obtained by computing the consensus tree using topology probabilities from the multispecies coalescent model. Motivation: if there are a large number of independent loci, observed gene tree, clade, and rooted triple proportions should approximate their theoretical probabilities.

46
Simulated gene trees Greedy consensus tree

47
Greedy consensus tree

48
Greedy consensus tree Simulated gene trees Greedy consensus tree R* consensus tree

49
**Majority-rule: unresolved zone**

50
Too-greedy zone

51
**Is species tree inference consistent in this setting?**

1. Concatenation? No. 2. Consensus? Yes (R*), no for greedy and majority-rule.

52
**Are consensus trees inconsistent estimators of species trees?**

Theorem 2. (i) Majority-rule asymptotic consensus trees (MACTs) do not have any clades not on the species tree. (ii) Majority-rule unresolved zones exist for any species tree topology with n ≥ 3 species. Theorem 3. Greedy asymptotic consensus trees (GACTs) can be misleading estimators of species trees for the 4-species asymmetric tree and for any species tree with n > 4 species. Theorem 4. R* asymptotic consensus trees (RACTs) always match the species tree.

53
**What about finite samples?**

If you sample 10 loci, you could have: All 10 match the species tree 9 match the species tree, 1 disagrees 8 match the species tree, 2 disagree, etc. You can consider gene trees as categories and use multinomial probabilities for the probability of your sample

54
R* consensus, y = 0.4, x = 0.6

55
Conclusion Coalescent gene tree probabilities can be used to prove or disprove the statistical consistency of species tree estimators.

57
R* consensus, y = x = 0.1 Number of genes Probability

Similar presentations

OK

Descriptive statistics Experiment Data Sample Statistics Experiment Data Sample Statistics Sample mean Sample mean Sample variance Sample variance.

Descriptive statistics Experiment Data Sample Statistics Experiment Data Sample Statistics Sample mean Sample mean Sample variance Sample variance.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google

Ppt on single phase and three phase dual converter operation Ppt on standing order Ppt on autonomous car software Ppt on area of parallelogram and triangles and angles Ppt on endangered animals and plants Ppt on marketing management by philip kotler's segment-by-segment Download ppt on abdul kalam Ppt on chapter 3 atoms and molecules coloring Ppt on lathe machine parts Ppt on power system stability examples