Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Graph of Life Dennis Shasha Joint work with Kenneth Birnbaum Treester system by: Matt Olim.

Similar presentations


Presentation on theme: "The Graph of Life Dennis Shasha Joint work with Kenneth Birnbaum Treester system by: Matt Olim."— Presentation transcript:

1 The Graph of Life Dennis Shasha Joint work with Kenneth Birnbaum Treester system by: Matt Olim

2 Phylogenetic Reconstruction (careful: root is bottom-most) Strictly non cyclic maximum parsimony maximum likelihood

3 Character Conflict (one feature/two places)

4 Why Conflict? X ancient hybridization event ~100,000 years ago Helianthus petiolaris Helianthus annuus Helianthus paradoxus sunflowers

5 H. paradoxus Adapted from Rieseberg et al. 1991 Phylogenetic … Trees?

6 Phylogenomics takes many individual gene trees (technically, orthologs) combines data -- e.g., sequence concatenation obtains a single tree from combined data -- hopefully with high confidence! Rokas et al. (Nature 2003): 20 trees is enough (based on 8 yeast species).

7 Observation: Individual Trees Vary From Rokas et al. 2003 Example: multiple gene trees of eight sequenced yeast species

8 100% S. cerevisiae S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. castellii S. kluyveri C. albicans 100% For this data set Conflict Smoothed by Combining Data Most parsimonious tree from 106 individual trees concatenated

9 But then why do the trees vary? Noise (maybe trees for a given gene aren’t right) Hybridization (different species have viable offspring) Horizontal gene transfer, e.g. bacterial orgy Convergent evolution (think cactus, only at genomic level) Whatever reason: settling on a consensus tree may throw away much information.

10 Begin with several trees for each orthologous gene. Not only the most parsimonious ones but the top few. 1. Find trees in descending order of popularity and see whether genes have interesting commonalities. 2. Generate a network from popular trees. Finding Hidden Signals

11 Using PAUP, generated top 10 most parsimonius trees for each orthologous gene that was present in all eight species. Popularity contest: loop find most popular tree spit out tree and associated genes remove genes having that tree end loop Unused Popularity contest: same but don’t remove. Data SetUp

12 Genes following the Rokas consensus tree are normal in every way. 378 of those. (We expanded analysis) (((((((Scer,Spar),Smik),Skud),Sbay),Scas),Sklu),Calb) [count378] 46 genes associated with next tree somewhat closer to one another than expected, but not quite at the 5% threshold. (((((((Scer,Spar),Smik),Skud),Sbay),Sklu),Scas),Calb) [count46] 13 genes associated with next tree are well within the 5% threshold of being close to one another and on only 5 of 17 chromosomes. (((((Skud,Sbay),((Scer,Spar),Smik)),Scas),Sklu),Calb) [count13] Findings

13 4 of the 13 genes are annotated as having ATPase or ATP synthase (only 92 out 6,000 genes have similar annotations). Consensus tree is quite different from 13 gene tree. (((((((Scer,Spar),Smik),Skud),Sbay),Scas),Sklu),Calb) (((((Skud,Sbay),((Scer,Spar),Smik)),Scas),Sklu),Calb) Other Odd Properties of the 13

14 S. cerevisiae S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. castellii S. kluyveri C. albicans S. cerevisiae S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. castellii S. kluyveri C. albicans The odd 13 consensus tree Remnants of an ancient hybridization event? Parallel gene evolution among ATP-related genes and others ?

15 The Graph 0.3 0.7

16 LatTrans: Addario-Berry. Models lateral transfer of genetic information. Makes some assumptions about mutation rates. Always between siblings. Model horizontal gene transfer: Lake and Rivera. Procaryotic model that tries to distill fundamental tree of life assuming Markov model. Vs. Doolittle Network Building

17 Moret, Nakleh et al propose "galled networks" which are networks where hybridizations don't intersect. They argue that this limitation makes sense, because there are modest levels of recombination. Our approach: start with reliable gene trees and build a “conservative” species graph. No statistical assumptions except quality of tree branching. Network Building 2

18 Nomenclature Gene Tree A (gene A and orthologs) A A2A2 A 12 A1A1 A 122 A 121 A 11 missing ancestral gene variants A1 is parent of A12 extant gene variants

19 Assumptions variant = one of the orthologs of a gene 1. A variant is likely to arise only once in the tree or network (convergent phenotype yes; but not same sequence). 2. If species X has one parent P, then for each gene A, the variant of A in X must be the direct descendent of the variant in P or equal to that variant (e.g., A 12 -->A 121 ) 3. If species X has more than one parent, then for each gene A, the variant of A must descend/be equal to the variant in exactly one parent.

20 First: characterize species by tree position sp1sp2 sp3 sp2 sp1 sp1: A1 sp2: A21 sp3: A22 sp1:B22 sp2:B21 sp3:B11 sp2(A21 B21) sp3(A22 B11) In a tree for sp1 and sp2, those species must both be descendants of node N where B2 arises. Further, the split between A1 and A2 must descend from N. sp1(A1 B22) B B2

21 Let’s Illustrate Dependencies Species: sp1(A1 B22), sp2(A21 B21), sp3(A22 B11) From sp1 and sp2, B2 arises before A1 splits from A2. Birth(B2) before birth(A2) From sp2 and sp3, A2 arises before B1 splits from B2. Birth(A2) before birth(B2). Shows that tree is not possible. We choose a tree that is consistent with as many species as possible and then add the remaining species using as few edges as possible. Weights indicate number of species.

22 One possible graph sp1sp2 sp3 sp2 sp1 sp1: A1 sp2: A21 sp3: A22 sp1:B22 sp2:B21 sp3:B11 m1 (A B) m2 (A B1) m3 (A2 B2) sp2(A21 B21) sp3 sp1(A1 B22) m3 sp1 sp2 m1 m2 “base tree” sp3(A22 B11) stranded taxa B B2

23 The three trees seem quite different: (((((((Scer,Spar),Smik),Skud),Sbay),Scas),Sklu),Calb) (((((((Scer,Spar),Smik),Skud),Sbay),Sklu),Scas),Calb) (((((Skud,Sbay),((Scer,Spar),Smik)),Scas),Sklu),Calb) In particular, Skud seems to move a lot. But our graph showed multiple ancestry for Scas only. Does it make sense?

24 Observe that Scer, Spar, Smik always form same subtree, so let’s replace by a single node xxx. Then remove Scas because we are interested only in whether what remains forms a tree. Here is what we get: ((((((xxx),Skud),Sbay)),Sklu),Calb) ((((((xxx),Skud),Sbay),Sklu)),Calb) (((((Skud,Sbay),(xxx))),Sklu),Calb) Well, maybe

25 infer ancestral states for each gene tree find tree that includes a maximal set of species (“conservative” base tree) infer parents of remaining species from ancestors (graph) Graphing Phylogenies

26 Graph Version Tree Version 100% S. cerevisiae S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. castellii S. kluyveri C. albicans 100%

27 Start of Interpretation Of the 108 gene trees, 15 suggest that S. kudriavzevii is a sister species to S. bayanus. Another 24 make it a predecessor of S. bayanus. May indicate a transfer from more ancestral nodes. Need to look at more genes to get conclusive results.

28 Summary Phylogenomics generates large datasets to overcome conflicting signals in phylogenetic trees, but may cause us to ignore biological signals. Phylogenetic Networks -- A Graph of Life -- can suggest gene transfers through hybridization or some other reason. Basic method: start with gene trees, take reliable bifurcations (over 60%) and combine them into a consensus directed graph that suggests possible paths of gene transfer. Software is general purpose and available.

29 Cause of Conflict? Traces of an anicent hybridization Possible expectation: blocks of genes with common ancestry none of the genes contributing to network edge are clustered on the chromosome IN C. cerevisiae but 13 oddballs are something of an exception …total number of genes is 108, so synteny impossible. Will extend this in future. Convergent Evolution Do genes comprising network edges have a common function? No but still looking. Examined 3 best supported network edges (most gene tree support)

30 Major Cyclic Edges S. cerevisiae S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. kluyveri C. albicans S. castellii 12 gene trees 24 gene trees 15 gene trees Revised from Rokas et al. 2003


Download ppt "The Graph of Life Dennis Shasha Joint work with Kenneth Birnbaum Treester system by: Matt Olim."

Similar presentations


Ads by Google