Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.

Similar presentations


Presentation on theme: "A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin."— Presentation transcript:

1 A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin

2 Who’s Involved –UT CS: Tandy Warnow, Luay Nakhleh –UT BIO: Randy Linder –UNM CS: Bernard Moret

3 Why Networks? Lateral gene transfer (LGT) –Ochman estimated that 755 of 4,288 ORF’s in E.coli were from at least 234 LGT events Hybridization –Estimates that as many as 30% of all plant lineages are the products of hybridization –Fish –Some frogs

4 Phylogenetic Networks Rooted, directed, acyclic graphs that actually model the evolutionary process “tree” nodes and “network” nodes Time constraints

5 Separate Analysis Analyze individual genes separately Reconcile the resulting phylogenies As opposed to combined analysis in which the datasets are combined (via concatenation) and the combined dataset is then analyzed

6 Wayne Maddison’s Observation “What is needed is a method that counts the minimal number of branch moves needed to convert one tree into another, where branch moves are restricted so as not to violate a linear order.” Syst. Biol., 46(3):523-536, 1997.

7 Species Networks ABCDE

8 Gene Tree I in Species Networks ABCDE ABCDE

9 Gene Tree II in Species Networks ABCDE ABCDEABCDE

10 The SPR Operation SPR: Subtree Prune and Regraft Prune a subtree in tree T1 and regraft to another edge (by the same root), thus obtaining another tree T2 The SPR-Distance between two trees T1 and T2 is the minimum number of SPR moves required to transfer T1 to T2

11 SPR Distances Among Gene Trees ABCDE ABCDEABCDE SPR Distance 1

12 Maddison’s Method Given two gene datasets Construct two gene trees T1 and T2 If SPR(T1,T2)=0 –Return a tree If SPR(T1,T2)=1 –Return a network with one reticulation event Open problem: extend to reconstructing a network with m reticulation events

13 Challenges (1) Computational –Computing SPR distances is of unknown computational complexity (probably hard)

14 Solving the Computational Challenge Galled-networks: reticulation events are independent For two gene trees T1 and T2 on n leaves we can –Decide whether SPR(T1,T2)=m in O(mn) time, and –Construct network N from T1 and T2 in O(mn) time

15 Challenges (2) Systematic –Obtaining the correct gene trees in practice is very hard (due to missing data, inaccuracy of tree reconstruction methods, wrong assumptions, etc.)

16 Solving the Systematic Challenge: Our Method SpNet Given the sequences of two genes I & II on a set of species Run MP or ML on gene I and obtain a set U1 of trees, represented by its consensus tree t1 Run MP or ML on gene II and obtain a set U2 of trees, represented by its consensus tree t2 Find binary trees T1 and T2, that refine t1 and t2, respectively, and such that SPR(T1,T2)=1 Build network N from T1 and T2

17 SpNet: Running Time We have a linear-time algorithm for the single hybrid case (implementation and experimental results are available as well) We are working on the general case of arbitrary number of reticulation events

18 Experimental Study Generated random networks on 10 and 20 taxa, with 0, 1, and 2 hybrids Evolved sequences under the GTR+Gamma model of evolution with invariant sites We studies the topological accuracy based on the splits defined by the model and inferred network

19 Evaluation Criteria Detection Quality –How often did the method infer the correct number of hybrids in the model phylogeny? Reconstruction Quality –What is the topological accuracy of the inferred phylogeny?

20 Methods SpNet(i): Our method where we contract i edges NNet: The method of Bryant and Moulton NJ

21 Detection Quality of SpNet Model Phylogeny: 20-taxon Tree

22 Detection Quality of SpNet Model Phylogeny: 20-taxon 1-hybrid network

23 Detection Quality of SpNet Model Phylogeny: 20-taxon 2-hybrid network

24 Reconstruction Quality Model Phylogeny: 20-taxon tree

25 Reconstruction Quality Model Phylogeny: 20-taxon 1-hybrid network

26

27 Conclusions Considering a set of “good” trees rather than a single optimal tree is advantageous in network reconstruction Separate analysis approaches outperform combined analysis approaches

28 Ongoing research Using other techniques for obtaining unresolved trees (e.g., Bayesian analyses, bootstrapping, etc.) Detection vs. reconstruction – visualization and clustering techniques may also be useful (collaboration with St John) Refining unresolved networks DCM-like network reconstruction


Download ppt "A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin."

Similar presentations


Ads by Google