. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #12 © Ilan Gronau.

Slides:



Advertisements
Similar presentations
Maximum Parsimony Probabilistic Models of Evolutions Distance Based Methods Lecture 12 © Shlomo Moran, Ilan Gronau.
Advertisements

WSPD Applications.
8.3 Representing Relations Connection Matrices Let R be a relation from A = {a 1, a 2,..., a m } to B = {b 1, b 2,..., b n }. Definition: A n m  n connection.
PHYLOGENETIC TREES Bulent Moller CSE March 2004.
Greedy Algorithms Greed is good. (Some of the time)
Reachability as Transitive Closure
Lecture 17 Path Algebra Matrix multiplication of adjacency matrices of directed graphs give important information about the graphs. Manipulating these.
Discussion #34 1/17 Discussion #34 Warshall’s and Floyd’s Algorithms.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Divide-and-Conquer Matrix multiplication and Strassen’s algorithm Median Problem –In general finding the kth largest element of an unsorted list of numbers.
פרויקט בתכנות מחקר השוואתי בשחזור עצי אבולוציה: אלגוריתמים קיימים מול תכנות בשלמים אביב 2013 מרצה: שלמה מורן מנחה חיצוני: יוסי שילוח Website:
PLGW01 - September Inferring Phylogenies from LCA distances (back to the basics of distance-based phylogenetic reconstruction) Ilan Gronau Shlomo.
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Problem Set 2 Solutions Tree Reconstruction Algorithms
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
Fast Algorithms for Minimum Evolution Richard Desper, NCBI Olivier Gascuel, LIRMM.
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
. Advanced programming Algorithms for reconstructing phylogenetic trees spring 2006 Lecturer: Shlomo Moran, Taub 639, tel 4363 TA: Ilan Gronau,
. Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau.
Linear Least Squares and its applications in distance matrix methods Presented by Shai Berkovich June, 2007 Seminar in Phylogeny, CS Based on the.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Multiple sequence alignment
Phylogeny Tree Reconstruction
Algorithms All pairs shortest path
Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.
. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #13 © Ilan Gronau.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Perfect Phylogeny MLE for Phylogeny Lecture 14
Phylogenetic Trees Lecture 2
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
CS 473 All Pairs Shortest Paths1 CS473 – Algorithms I All Pairs Shortest Paths.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Phylogenetic trees Sushmita Roy BMI/CS 576
1 Network Optimization Chapter 3 Shortest Path Problems.
1 By: MOSES CHARIKAR, CHANDRA CHEKURI, TOMAS FEDER, AND RAJEEV MOTWANI Presented By: Sarah Hegab.
PHYLOGENETIC TREES Dwyane George February 24,
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Trees and Distance. 2.1 Basic properties Acyclic : a graph with no cycle Forest : acyclic graph Tree : connected acyclic graph Leaf : a vertex of degree.
Plgw03, 17/12/07 1 On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities Ilan Gronau Shlomo Moran Technion – Israel Institute of Technology.
The Neighbor Joining Tree-Reconstruction Technique Lecture 13 ©Shlomo Moran & Ilan Gronau.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Evolutionary tree reconstruction (Chapter 10). Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Evolutionary tree reconstruction
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
The all-pairs shortest path problem (APSP) input: a directed graph G = (V, E) with edge weights goal: find a minimum weight (shortest) path between every.
Dynamic Programming Greed is not always good.. Jaruloj Chongstitvatana Design and Analysis of Algorithm2 Outline Elements of dynamic programming.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
All-Pairs Shortest Paths
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
1 Assignment #3 is posted: Due Thursday Nov. 15 at the beginning of class. Make sure you are also working on your projects. Come see me if you are unsure.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Distance-based phylogeny estimation
dij(T) - the length of a path between leaves i and j
COMP 6/4030 ALGORITHMS Prim’s Theorem 10/26/2000.
Parallel Graph Algorithms
Phylogeny.
Parallel Graph Algorithms
Perfect Phylogeny Tutorial #10
Presentation transcript:

. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #12 © Ilan Gronau.

. Distance-Based Phylogenetic Reconstruction The distance-based approach: Estimate evolutionary distances between every two species. Reconstruct Phylogenetic tree (best) fitting the dissimilarity matrix. You saw in class: A phylogenetic tree is uniquely defined by its induced metric. (metrics which can be realized by some tree are called additive) There are efficient methods for reconstructing this tree. Problems: How do we estimate evolutionary distances?  We don’t discuss this in this course. How ‘close’ do these distances have to be to the ‘real’ distances?  This tutorial.

. a b c d e f g h a b c d e f g h A phylogenetic tree is uniquely defined by its induced metric. How dense is this “space”? Can we tolerate some small noise? The Phylogenetic “Error-Correction Code” All matrices in a ball surrounding each additive metric uniquely define the topology of the tree. The radius of these balls depend on the “center tree” (weight of minimal edge). T1T1 T2T2 T3T3 [Atteson ‘99]

. l ∞ norm (worst-case noise): A dissimilarity matrix D is near-additive if there is a binary tree T s.t. ||D,D T || ∞ < ½ *w min (T) Near-additive matrices uniquely define a tree topology. They define tangent balls. The Phylogenetic “Error-Correction Code” [Atteson ‘99] ||D,D T1 || ∞ = ½ *w min (T 1 ) ||D,D T2 || ∞ = ½ *w min (T 2 ) T1T1 T2T2 T 1 and T 2 have different topologies ! You show in HW 5. We show how to reconstruct the correct topology given near-additive dissimilarities D

. Input: a dissimilarity matrix D over S. Output: A phylogenetic tree over S. a)Choose root r  S. b)Calculate LCA-depths from r : Stopping condition: if L=[w], return T = Otherwise: 1.Choose a ‘mutually deepest’ pair (i,j) ( L(i,j) = max k≠i { L(i,k) } = max k≠j { L(k,j) } ) 2.Replace i,j with new element v, and reduce L : L(v,v) = L(i,j) For k≠v, L(v,k) = αL(i,k) + (1-α)L(j,k) ( 0 ≤ α ≤ 1 ) 3.Recursively execute the algorithm on the reduced matrix 4.Add i,j as daughter nodes of v with edges of weight: w(v,i) = max{ 0, L(i,i) – L(i,j) } ; w(v,j) = max{ 0, L(j,j) – L(i,j) } r x w Deepest LCA Neighbor Joining convex reduction

. Sketch of consistency proof (shown in class): If D is additive, consistent with tree T, then L=LCA(D,r) contains the distances of all taxon-pair LCAs from r. A ‘mutually deepest’ taxon-pair (i,j) is a neighbor-pair (cherries). The reduction computes the ‘real’ LCA-depths corrsponding to v – the parent of (i,j). - L(v,v) = L(i,j). ( v is the LCA of i and j ). - for k≠v, L(v,k) = L(i,k) = L(j,k). Deepest LCA Neighbor Joining

. B C A E D D is additive: Deepest LCA Neighbor Joining - Example D: ABCDE A B C 0116 D 07 E 0 L: ABCD A 7331 B 3941 C 3461 D 1117 root B/C C B A/B/C ( B,C ) is the only mutually deepest pair. We can tolerate noise smaller than ±½. row maxima In general we can tolerate any noise which maintains the off-diagonal maximum in every row.

. Robustness of DLCA Theorem: If ||D,D T || ∞ < ½*w min (T), then the tree returned by DLCA on input D has the same topology as T. (for any selection of root) DTDT D Let L be the matrix calculated in stage (b) ( L = LCA(D,r) ). Let L T be the “true” LCA matrix ( L T = LCA(D T,r) ). 1.We show that L weakly preserves the order of each row in L T. ( L T (i,j)> L T (i,k)  L(i,j)> L(i,k) ) 2.We prove by induction that this implies that the recursive procedure outputs a tree with the same topology as T.

. Robustness of DLCA (cont) L T (i,j) > L T (i,k)  ½(D T (r,i)+D T (r,j)-D T (i,j)) > ½(D T (r,i)+D T (r,k)-D T (i,k))  D T (r,j)-D T (i,j)) > D T (r,k)-D T (i,k)  D T (r,j)+D T (i,k)) > D T (r,k)+D T (i,j)  D T (r,j)+D T (i,k)) ≥ D T (r,k)+D T (i,j)+2 * w min (T)  D(r,j)+D(i,k)) > D(r,k)+D(i,j)  D(r,j)-D(i,j)) > D(r,k)-D(i,k)  ½(D(r,i)+D(r,j)-D(i,j)) > ½(D(r,i)+D(r,k)-D(i,k))  L(i,j) > L(i,k) 1.If ||D,D T || ∞ L T (i,k)  L(i,j)> L(i,k) ) k r i j w ≥ w min (T) T : 4-point condition ||D,D T || ∞ < ½*w min (T)

. Robustness of DLCA (cont) 2.If L weakly preserves the order of each row in L T, then the recursive procedure returns a tree with the same topology as T. a)The pair (i’,j’) chosen in step (1) is a neighbor-pair in T. (i’,j’) is a mutually deepest pair in L  For every k≠i’,j’, max{L(i’,k), L(j’,k)} ≤ L(i’,j’)  For every k≠i’,j’, max{L T (i’,k), L T (j’,k)} ≤ L T (i’,j’)  i’ and j’ are neighbors in T. Assume: L T (i,j)> L T (i,k)  L(i,j)> L(i,k) shown in class Base case is immediate

. Robustness of DLCA (cont) 2.If L weakly preserves the order of each row in L T, then the recursive procedure returns a tree with the same topology as T. Assume: L T (i,j)> L T (i,k)  L(i,j)> L(i,k) a)The pair (i’,j’) chosen in step (1) is a neighbor-pair in T. b)The reduced matrix L’ calculated in step (2) weakly preserves the order of each row in the reduced L’ T. Assume L’ T (i,j)> L’ T (i,k). If i,j,k≠v (new vertex), then L’(i,j)> L’(i,k) by the induction hypothesis. If i=v, then L’ T (v,j) =L T (i’,j) =L T (j’,j) and L’ T (v,k) =L T (i’,k) =L T (j’,k)  min{L T (i’,j), L T (j’,j)} > max{L T (i’,k), L T (j’,k)}  min{L(i’,j), L(j’,j)} > max{L(i’,k), L(j’,k)}  L’(v,j) > L’(v,k) Can be similarly shown when j=v or k=v. convex reduction

. Robustness of DLCA (cont) 2.If L weakly preserves the order of each row in L T, then the recursive procedure returns a tree with the same topology as T. Assume: L T (i,j)> L T (i,k)  L(i,j)> L(i,k) a)The pair (i’,j’) chosen in step (1) is a neighbor-pair in T. b)The reduced matrix L ’ calculated in step (2) weakly preserves the order of each row in the reduced L’ T. c)The induction hypothesis implies that the tree (over S\{i’,j’} U {v} ) returned by the recursive call in step (3) has the same topology as T (with i’,j’ replaced by v ). d)In step (4) we add i’ and j’ as sons of v and the resulting tree has the same topology as T. Q.E.D

. Robustness of Other Algorithms Many other algorithms also reconstruct the correct topology given near-additive input: Other neighbor joining algorithms: Saitou and Nei’s NJ, AddTree … All quartet-based algorithms (you show this in HW 5) Atteson defines two reconstruction radii: An algorithm A has l ∞ -radius of ε iff it is guaranteed to return binary tree T given D s.t. ||D,D T || ∞ < ε *w min (T) An algorithm A has edge l ∞ -radius of ε iff it correctly reconstructs all edges in of weight > (1/ ε)* ||D,D T || ∞ edge l ∞ -radius ≤ l ∞ -radius ≤ ½

. Generalized Robustness An algorithm A has edge l ∞ -radius of ε iff it correctly reconstructs all edges in of weight > (1/ ε)* ||D,D T || ∞ DLCA has optimal edge l ∞ -radius of ½. NJ has edge l ∞ -radius of ¼. Typically, NJ reconstructs more edges than DLCA. Why is this? Worst-case noise ( ||D,D T || ∞ ) is typically much larger than average-case noise.