PLGW01 - September 2007. 1 Inferring Phylogenies from LCA distances (back to the basics of distance-based phylogenetic reconstruction) Ilan Gronau Shlomo.

Slides:



Advertisements
Similar presentations
Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau.
Advertisements

Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.
An Introduction to Phylogenetic Methods
BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.
Molecular Evolution Revised 29/12/06
Clustering II.
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
The Evolution Trees From: Computational Biology by R. C. T. Lee S. J. Shyu Department of Computer Science Ming Chuan University.
Fast Algorithms for Minimum Evolution Richard Desper, NCBI Olivier Gascuel, LIRMM.
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
. Advanced programming Algorithms for reconstructing phylogenetic trees spring 2006 Lecturer: Shlomo Moran, Taub 639, tel 4363 TA: Ilan Gronau,
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
. Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau.
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #12 © Ilan Gronau.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Protein Sequence Classification Using Neighbor-Joining Method
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.
. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #13 © Ilan Gronau.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Estimating Evolutionary Distances from DNA Sequences Lecture 14 ©Shlomo Moran, parts based on Ilan Gronau.
Phylogenetic Trees Lecture 2
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Phylogenetic trees Sushmita Roy BMI/CS 576
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.
Terminology of phylogenetic trees
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
PHYLOGENETIC TREES Dwyane George February 24,
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Plgw03, 17/12/07 1 On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities Ilan Gronau Shlomo Moran Technion – Israel Institute of Technology.
The Neighbor Joining Tree-Reconstruction Technique Lecture 13 ©Shlomo Moran & Ilan Gronau.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Distance-based phylogeny estimation
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Consistent and Efficient Reconstruction of Latent Tree Models
Distance based phylogenetics
dij(T) - the length of a path between leaves i and j
Inferring a phylogeny is an estimation procedure.
Computer Vision Lecture 12: Image Segmentation II
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
Hierarchical clustering approaches for high-throughput data
Technion – Israel Institute of Technology
BNFO 602 Phylogenetics Usman Roshan.
Lecture 7 – Algorithmic Approaches
Incorporating uncertainty in distance-matrix phylogenetics
Presentation transcript:

PLGW01 - September Inferring Phylogenies from LCA distances (back to the basics of distance-based phylogenetic reconstruction) Ilan Gronau Shlomo Moran Technion, Israel

PLGW01 - September Distance-Based Phylogenetic Reconstruction Compute distances between all taxon-pairs Find a tree (edge-weighted) best-describing the distances

PLGW01 - September Basics (Sanity check): Reconstruction algorithms should be consistent, i.e. reconstruct the true tree from accurate (ie, additive) distances. Essential Extras:  Robustness to noise : Reconstruct the correct tree (or parts of it) given noisy distances.  Efficiency: Low time/space complexity Distance-Based Reconstruction

PLGW01 - September Neighbor Joining Methods A taxon-pair i,j is chosen and replaced by a new taxon v i,j are connected to new taxon v (i.e. are cherries in the reconstructed tree) Method recursively applied on reduced matrix An agglomerative clustering approach:

PLGW01 - September The Two Basic Components of NJ Methods At each iteration the algorithm performs: 1.Selection: select neighboring taxons consistency: if input is additive, selected taxa are cherries in the corresponding tree 2.Reduction: compute distances from the new taxon consistency: the reduced matrix should fit the reduced tree. usually can be achieved in more than one way

PLGW01 - September Saitou & Nei’s NJ Algorithm (1987)  Robustness: Considered highly reliable in practice  Time complexity - θ(n 3 )  ~13,000 citations ( Science Citation Index )  Implemented in numerous phylogenetic packages Questions:  What makes Saitou&Nei’s neighbor selection criterion so good?  Is there any simpler consistent neighbor-selection criterion? Saitou & Nei’s  selection criterion

PLGW01 - September Simple Selection Criterion: LCA Distances In a rooted tree, LCA(i,j) is the distance between the root and the least common ancestor of i,j Taxon-pair with deepest LCA are neighbors Also pair i,j with “locally deepest” LCA: For neighbors i,j with parent v : i j r j i j k Consistent (and complete)  neighbor-selection criterion v

PLGW01 - September Deepest LCA Neighbor Joining Algorithm Phase I i r j calculate LCA-distances: Choose root taxon r Calculate LCA-distances from r using Farris Transform: L(i,j) = ½ ( D(r,i) + D(r,j) - D(i,j) )

PLGW01 - September n -1 neighbor-joining iterations At each iteration: Selection: Choose taxon pair i,j, s.t. L(i,j) = max i’≠j’ { L(i’,j’) } Connect i,j to new taxon v Reduction: Replace i,j with new taxon v, and reduce L : For k≠v, L(v,k)= α L(i,k) + (1- α )L(j,k) (α – reduction parameter, may be re-defined each iteration ) Deepest LCA Neighbor Joining Algorithm Phase II

PLGW01 - September Calculating LCA-distances (the matrix L) - θ(n 2 ) time Neighbor joining algorithm: n-1 neighbor joining iterations: -Reduction step takes O(n) time per iteration - Bottleneck is in neighbor selection An amortized θ(n 2 ) implementation of neighbor selection: Join “locally deepest” pair and not necessarily “globally deepest” pair, using the “Nearest Neighbor Chain” clustering technique [Benzecri 82, Juan 82, Murtagh 84, +] Simple and Optimal θ(n 2 ) Implementation of DLCA

PLGW01 - September DLCA: Intermediate Summary A simple and intuitive consistent neighbor selection criterion Implemented in optimal time complexity (faster than NJ) Robustness to noise: We consider two theoretical criteria for robustness: Reconstruction of “ Buneman edges ” Atteson ’ s “ edge-reconstruction radius ” What about the noise ?!

PLGW01 - September P Q Buneman’s Edges [Buneman ’71] D (i,j)+D (k,l) < D (i,k)+D (j,l), D (i,l)+D (j,k) e An edge e induces a split (P|Q) of the taxon set e is a “Buneman edge” (w.r.t. Distance matrix D) iff all taxon-quartet (i,j,k,l) which “crosses” e (i.e. i,j ∊ P, k,l ∊ Q ) agree with e’s split: “Buneman Robustness criterion”: the algorithm should reconstruct all the Buneman edges. j i l k

PLGW01 - September Edge reconstruction-radius: A has edge-reconstruction radius of ε if for each edge e: If ||D-D T || ∞ < ε∙w (e): Then A correctly reconstructs e.  A satisfies Buneman’s criterion A has optimal edge-radius of ½ Atteson’s Edge-Reconstruction radius [Atteson ‘99] Atteson: edge-reconstruction radius ≤ ½ e w(e) Noise≤ ε w(e) (for all distances)

PLGW01 - September NJ : -edge-reconstruction radius = ¼ [Atteson ’99, Mihaescu et al ‘06] (hence it does not satisfy the Buneman Criterion) DLCA (using “conservative reductions”): - Satisfies the Buneman Criterion - Hence it has edge-reconstruction radius = ½ Robustness of NJ and DLCA By these criteria, DLCA is also more robust than NJ And in Practice…???

PLGW01 - September D Testing on Simulated Data DNAdist from PHYLIP T’ Compare topologies through RF-distance T ATTCG … ATACG … ACTGG … ATTCG … ATACG … ACTGG … ATTCG … ACTGG … ATTCG … ATACG … ACTGG … ATACG … AGTGG … DLCA / NJ Note that DLCA may produce n different trees – One for each taxon root. CTACG…

PLGW01 - September DLCA vs. Saitou&Nei’s NJ L(i,k)  max{L(i,k),L(j,k)} L(i,k)  ½(L(i,k) + L(j,k)) trees - 1 simulation per tree Tree Source: The Methods and Algorithms in Bioinformatics (MAB) lab, LIRMM.

PLGW01 - September Robustness of DLCA – a Summary DLCA is superior to NJ by Buneman&Atteson criteria, but (on the average) is inferior to NJ on simulated data. Where lies the reason for this “conflict”? Take another look at Saitou &Nei selection criterion

PLGW01 - September i.e., NJ tends to selects taxon-pairs with average deepest lca Averaging “smoothes” noise Averaging does not affect worst-case noise (The bound 1/4 on the reconstruction radius of NJ uses an highly improbable scenario) Saitou & Nei’s  Selection Criterion… … expressed by LCA distances 

PLGW01 - September Future Directions Use pivotal nature of DLCA to achieve better results: Pre-processing: use “good” taxa as roots Post-processing: return “best” tree among n possible outputs. Find robustness criteria which explain the robustness of NJ: Instead of considering worst-case noise (as Atteson’s criterion), consider stochastic noise.

PLGW01 - September For more information… "Neighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances" ( JCB 14(1) pp. 1-15, 2007) "Optimal Implementations of UPGMA and Other Common Clustering Algorithms” (to Appear in IPL) Our websites:

PLGW01 - September Thank You