Technion – Israel Institute of Technology

Slides:



Advertisements
Similar presentations
Routing Complexity of Faulty Networks Omer Angel Itai Benjamini Eran Ofek Udi Wieder The Weizmann Institute of Science.
Advertisements

CS 598AGB What simulations can tell us. Questions that simulations cannot answer Simulations are on finite data. Some questions (e.g., whether a method.
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Dynamic Graph Algorithms - I
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
PLGW01 - September Inferring Phylogenies from LCA distances (back to the basics of distance-based phylogenetic reconstruction) Ilan Gronau Shlomo.
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.
2-dimensional indexing structure
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Bell Laboratories Lucent Technologies 600 Mountain Avenue Murray Hill, NJ.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
CS Lecture 9 Storeing and Querying Large Web Graphs.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
The Complexity of Algorithms and the Lower Bounds of Problems
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Estimating Evolutionary Distances from DNA Sequences Lecture 14 ©Shlomo Moran, parts based on Ilan Gronau.
CIS786, Lecture 4 Usman Roshan.
Defining Polynomials p 1 (n) is the bound on the length of an input pair p 2 (n) is the bound on the running time of f p 3 (n) is a bound on the number.
Informed Search Idea: be smart about what paths to try.
Complexity and The Tree of Life Tandy Warnow The University of Texas at Austin.
Minimum Spanning Trees
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
An Approximation Algorithm for Binary Searching in Trees Marco Molinaro Carnegie Mellon University joint work with Eduardo Laber (PUC-Rio)
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Plgw03, 17/12/07 1 On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities Ilan Gronau Shlomo Moran Technion – Israel Institute of Technology.
Crimson: A Data Management System to Support Evaluating Phylogenetic Tree Reconstruction Algorithms Yifeng Zheng, Stephen Fisher, Shirley cohen, Sheng.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Fabio Pardi PhD student in Goldman Group European Bioinformatics Institute and University of Cambridge, UK Joint work with: Barbara Holland, Mike Hendy,
CSE 2331 / 5331 Topic 12: Shortest Path Basics Dijkstra Algorithm Relaxation Bellman-Ford Alg.
New methods for inferring species trees in the presence of incomplete lineage sorting Tandy Warnow The University of Illinois.
Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
SupreFine, a new supertree method Shel Swenson September 17th 2009.
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Absolute Fast Converging Methods CS 598 Algorithmic Computational Genomics.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
Distance-based phylogeny estimation
New Approaches for Inferring the Tree of Life
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Distance based phylogenetics
dij(T) - the length of a path between leaves i and j
Multiple Sequence Alignment Methods
Improved Randomized Algorithms for Path Problems in Graphs
Temporal Indexing MVBT.
Temporal Indexing MVBT.
Lecture 22 Complexity and Reductions
COMP 6/4030 ALGORITHMS Prim’s Theorem 10/26/2000.
Chapter 5. Optimal Matchings
Greedy Algorithms / Dijkstra’s Algorithm Yin Tat Lee
Randomized Algorithms CS648
Absolute Fast Converging Methods
Recent Breakthroughs in Mathematical and Computational Phylogenetics
Haitao Wang Utah State University SoCG 2017, Brisbane, Australia
CS 394C: Computational Biology Algorithms
September 1, 2009 Tandy Warnow
Lecture 6 Shortest Path Problem.
Informed Search Idea: be smart about what paths to try.
Algorithms for Inferring the Tree of Life
Tandy Warnow The University of Texas at Austin
Tandy Warnow The University of Texas at Austin
Informed Search Idea: be smart about what paths to try.
Imputing Supertrees and Supernetworks from Quartets
Perfect Phylogeny Tutorial #10
Learning a hidden graph with adaptive algorithms
Presentation transcript:

Technion – Israel Institute of Technology Adaptive Fast Convergence Towards Optimal Reconstruction Guarantees for Phylogenetic Trees Ilan Gronau Technion – Israel Institute of Technology Haifa, Israel Joint work with Shlomo Moran , Sagi Snir

Phylogenetic Reconstruction reconstructed tree “true tree” F D F B G B G A C D E A C E H I J H I J k A: TGGAC … ATT B: TGAAC … ATT C: GGGAT … ACT J: TGGAG … TCT Goal: reconstruct the true tree as accurately as possible December 5, 2018 PLGW03 - Cambridge, UK

Evaluating Reconstructed Tree “true tree” reconstructed tree contraction F D D F B G B G A C E H I J A C E H I J False Negatives: edges in the true tree which we don’t reconstruct False Positives: edges we reconstruct which aren’t in the true tree We’d like to reduce the number of reconstruction errors (FP and FN) December 5, 2018 PLGW03 - Cambridge, UK

The Reconstruction Threshold weight e1 e3 e4 e5 e6 e7 e2 ? e1 e6 e2 e5 e7 e3 e4 F D B A E C G J I H seq. length k  Input length may be insufficient to reconstruct some edges (short and deep) Can we guarantee reconstruction of all edges above the threshold? December 5, 2018 PLGW03 - Cambridge, UK

Near-optimal “information efficiency” Fast Convergence Near-optimal “information efficiency” P. Erdos, M. Steel, L. Szekely, and T. Warnow. A few logs suffice to build (almost) all trees (I). Random Structures and Algorithms, 14:153–184, 1999. D. Huson, S. Nettles, and T. Warnow. Disk-Covering, a fast-converging method for phylogenetic tree reconstruction. J Comp Biol, 6:369–386, 1999. T. Warnow, B. Moret, and K. St. John. Absolute convergence: true trees from short sequences. In SODA, pages 186–195, 2001. M. Csürös. Fast recovery of evolutionary trees with thousands of nodes. Journal of Computational Biology, 9(2):277–297, 2002. E. Mossel. Distorted metrics on trees and phylogenetic forests. ACM Transactions on computational biology and bioinformatics, 4:108–116, 2007. C. Daskalakis, C. Hill, A. Jaffe, R. Mihaescu, E. Mossel, and S. Rao. Maximal accurate forests from distance matrices. In RECOMB, pages 281–295, 2006. And more… Reconstruct the entire tree (w.h.p.) from sequences of polynomial-length. December 5, 2018 PLGW03 - Cambridge, UK

The Reconstruction Threshold Fast Converging Algorithms weight e1 e3 e4 e5 e6 e7 e2   e1 e6 e2 e5 e7 e3 e4 F D B A E C G J I H seq. length k Existing FC methods provide guarantees only when the threshold is lower than the weight of the shortest edge December 5, 2018 PLGW03 - Cambridge, UK

Forest Reconstruction Methods [Mossel `07] [Daskalakis et al `06] Shallow edges are easier to reconstruct December 5, 2018 PLGW03 - Cambridge, UK

Forest Reconstruction Methods [Mossel `07] [Daskalakis et al `06] Short edges block reconstruction of long edges deeper in the tree December 5, 2018 PLGW03 - Cambridge, UK

Adaptive Fast Convergence weight e1 e3 e4 e5 e6 e7 e2    e1 e6 e2 e5 e7 e3 e4 F ! Adaptive Fast Convergence ! D B A E C G J I H ε seq. length k Existing FC methods provide guarantees only when the threshold is lower than the weight of the shortest edge December 5, 2018 PLGW03 - Cambridge, UK

Incremental Reconstruction fast converging The incremental approach: [WSSB `77] [Csuros `99] [KZZ `03] Use directional queries to insert taxa one at a time. Directional queries implemented using a quartet oracle. Total time complexity of O(n2). A B A G D F D <ε E F G C Short edges (below reconstruction threshold) lead to false positives. False positives lead to faulty reconstruction of long edges. December 5, 2018 PLGW03 - Cambridge, UK

A Reliable Quartet Oracle The basic building-block: a reliable quartet oracle i k > ε *similar oracle used also in [Daskalakis et al `06] O(depth) j l Never returns a false quartet-split (may return ‘fail’). Returns correct split if: separating path is long enough (above the reconstruction threshold) quartet is short enough (proportional to tree-depth) December 5, 2018 PLGW03 - Cambridge, UK

A Reliable Incremental Algorithm The idea: never reconstruct faulty edges! Use truthful directional oracle (never wrong , may return ‘fail’). Insertion zone: leaves point “inwards” ; internal vertices give no direction. Contract edges already reconstructed, if necessary. contracted short edge G A B C D E F contraction A B ? ? D E ? must be contracted F G C False Positives: None. (returned tree is an edge-contraction of “true tree”) False Negatives: Only short edges. (contracted edges are below rec. thres.) December 5, 2018 PLGW03 - Cambridge, UK

Main Challenges Directional oracle on vertices of high degree - Correctness: no faulty answers + enough correct answers - Complexity: using O(deg(v)) quartet queries (sustaining O(n2) time complexity of algorithm) Querying only quartets of O(depth)-diameter - Representing each direction with a “close” taxon - Dealing with very large contracted subtrees v reconstructed tree v ε true tree More details in: “Fast and Reliable Reconstruction of Phylogenetic Trees with Very Short Edges”, In SODA `08 December 5, 2018 PLGW03 - Cambridge, UK

Towards Optimal Reconstruction Guarantees for Phylogenetic Trees Further optimizing reconstruction threshold: Reducing bound on diameter of quartets we query Allowing reconstruction of short shallow edges (using ideas from forest reconstruction) Practical issues: Improving reliability of directional oracle Using reliable partial reconstruction i j k l r = O(depth) e1  e1 e2 e3 e4 e5 e6 e7 A B C D F G I E H J e3 e4 e5 e6 ε e7 seq. length k e2  December 5, 2018 PLGW03 - Cambridge, UK

December 5, 2018 PLGW03 - Cambridge, UK