Presentation is loading. Please wait.

Presentation is loading. Please wait.

Technion – Israel Institute of Technology

Similar presentations


Presentation on theme: "Technion – Israel Institute of Technology"— Presentation transcript:

1 Technion – Israel Institute of Technology
Adaptive Fast Convergence Towards Optimal Reconstruction Guarantees for Phylogenetic Trees Ilan Gronau Technion – Israel Institute of Technology Haifa, Israel Joint work with Shlomo Moran , Sagi Snir

2 Phylogenetic Reconstruction
reconstructed tree “true tree” F D F B G B G A C D E A C E H I J H I J k A: TGGAC … ATT B: TGAAC … ATT C: GGGAT … ACT J: TGGAG … TCT Goal: reconstruct the true tree as accurately as possible December 5, 2018 PLGW03 - Cambridge, UK

3 Evaluating Reconstructed Tree
“true tree” reconstructed tree contraction F D D F B G B G A C E H I J A C E H I J False Negatives: edges in the true tree which we don’t reconstruct False Positives: edges we reconstruct which aren’t in the true tree We’d like to reduce the number of reconstruction errors (FP and FN) December 5, 2018 PLGW03 - Cambridge, UK

4 The Reconstruction Threshold
weight e1 e3 e4 e5 e6 e7 e2 ? e1 e6 e2 e5 e7 e3 e4 F D B A E C G J I H seq. length k Input length may be insufficient to reconstruct some edges (short and deep) Can we guarantee reconstruction of all edges above the threshold? December 5, 2018 PLGW03 - Cambridge, UK

5 Near-optimal “information efficiency”
Fast Convergence Near-optimal “information efficiency” P. Erdos, M. Steel, L. Szekely, and T. Warnow. A few logs suffice to build (almost) all trees (I). Random Structures and Algorithms, 14:153–184, 1999. D. Huson, S. Nettles, and T. Warnow. Disk-Covering, a fast-converging method for phylogenetic tree reconstruction. J Comp Biol, 6:369–386, 1999. T. Warnow, B. Moret, and K. St. John. Absolute convergence: true trees from short sequences. In SODA, pages 186–195, 2001. M. Csürös. Fast recovery of evolutionary trees with thousands of nodes. Journal of Computational Biology, 9(2):277–297, 2002. E. Mossel. Distorted metrics on trees and phylogenetic forests. ACM Transactions on computational biology and bioinformatics, 4:108–116, 2007. C. Daskalakis, C. Hill, A. Jaffe, R. Mihaescu, E. Mossel, and S. Rao. Maximal accurate forests from distance matrices. In RECOMB, pages 281–295, 2006. And more… Reconstruct the entire tree (w.h.p.) from sequences of polynomial-length. December 5, 2018 PLGW03 - Cambridge, UK

6 The Reconstruction Threshold Fast Converging Algorithms
weight e1 e3 e4 e5 e6 e7 e2 e1 e6 e2 e5 e7 e3 e4 F D B A E C G J I H seq. length k Existing FC methods provide guarantees only when the threshold is lower than the weight of the shortest edge December 5, 2018 PLGW03 - Cambridge, UK

7 Forest Reconstruction Methods
[Mossel `07] [Daskalakis et al `06] Shallow edges are easier to reconstruct December 5, 2018 PLGW03 - Cambridge, UK

8 Forest Reconstruction Methods
[Mossel `07] [Daskalakis et al `06] Short edges block reconstruction of long edges deeper in the tree December 5, 2018 PLGW03 - Cambridge, UK

9 Adaptive Fast Convergence
weight e1 e3 e4 e5 e6 e7 e2 e1 e6 e2 e5 e7 e3 e4 F ! Adaptive Fast Convergence ! D B A E C G J I H ε seq. length k Existing FC methods provide guarantees only when the threshold is lower than the weight of the shortest edge December 5, 2018 PLGW03 - Cambridge, UK

10 Incremental Reconstruction
fast converging The incremental approach: [WSSB `77] [Csuros `99] [KZZ `03] Use directional queries to insert taxa one at a time. Directional queries implemented using a quartet oracle. Total time complexity of O(n2). A B A G D F D E F G C Short edges (below reconstruction threshold) lead to false positives. False positives lead to faulty reconstruction of long edges. December 5, 2018 PLGW03 - Cambridge, UK

11 A Reliable Quartet Oracle
The basic building-block: a reliable quartet oracle i k > ε *similar oracle used also in [Daskalakis et al `06] O(depth) j l Never returns a false quartet-split (may return ‘fail’). Returns correct split if: separating path is long enough (above the reconstruction threshold) quartet is short enough (proportional to tree-depth) December 5, 2018 PLGW03 - Cambridge, UK

12 A Reliable Incremental Algorithm
The idea: never reconstruct faulty edges! Use truthful directional oracle (never wrong , may return ‘fail’). Insertion zone: leaves point “inwards” ; internal vertices give no direction. Contract edges already reconstructed, if necessary. contracted short edge G A B C D E F contraction A B ? ? D E ? must be contracted F G C False Positives: None. (returned tree is an edge-contraction of “true tree”) False Negatives: Only short edges. (contracted edges are below rec. thres.) December 5, 2018 PLGW03 - Cambridge, UK

13 Main Challenges Directional oracle on vertices of high degree
- Correctness: no faulty answers + enough correct answers - Complexity: using O(deg(v)) quartet queries (sustaining O(n2) time complexity of algorithm) Querying only quartets of O(depth)-diameter - Representing each direction with a “close” taxon - Dealing with very large contracted subtrees v reconstructed tree v ε true tree More details in: “Fast and Reliable Reconstruction of Phylogenetic Trees with Very Short Edges”, In SODA `08 December 5, 2018 PLGW03 - Cambridge, UK

14 Towards Optimal Reconstruction Guarantees for Phylogenetic Trees
Further optimizing reconstruction threshold: Reducing bound on diameter of quartets we query Allowing reconstruction of short shallow edges (using ideas from forest reconstruction) Practical issues: Improving reliability of directional oracle Using reliable partial reconstruction i j k l r = O(depth) e1 e1 e2 e3 e4 e5 e6 e7 A B C D F G I E H J e3 e4 e5 e6 ε e7 seq. length k e2 December 5, 2018 PLGW03 - Cambridge, UK

15 December 5, 2018 PLGW03 - Cambridge, UK


Download ppt "Technion – Israel Institute of Technology"

Similar presentations


Ads by Google