. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.

Slides:



Advertisements
Similar presentations
Lecture 15. Graph Algorithms
Advertisements

WSPD Applications.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Greedy Algorithms Greed is good. (Some of the time)
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Phylogenetic Trees Lecture 12
. Intro to Phylogenetic Trees Lecture 5 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran. Slight modifications by Benny.
Graphs III (Trees, MSTs) (Chp 11.5, 11.6)
3.3 Spanning Trees Tucker, Applied Combinatorics, Section 3.3, by Patti Bodkin and Tamsen Hunter.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
פרויקט בתכנות מחקר השוואתי בשחזור עצי אבולוציה: אלגוריתמים קיימים מול תכנות בשלמים אביב 2013 מרצה: שלמה מורן מנחה חיצוני: יוסי שילוח Website:
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Bioinformatics Algorithms and Data Structures
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
. Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau.
Graphs and Trees This handout: Trees Minimum Spanning Tree Problem.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
. Phylogenetic Trees Lecture 1 Credits: N. Friedman, D. Geiger, S. Moran,
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 11 Instructor: Paul Beame.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Perfect Phylogeny MLE for Phylogeny Lecture 14
Phylogenetic Trees Lecture 2
Algorithm Animation for Bioinformatics Algorithms.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield (updated April 12, 2009)
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
MA/CSSE 473 Day 36 Kruskal proof recap Prim Data Structures and detailed algorithm.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
CSCI 115 Chapter 7 Trees. CSCI 115 §7.1 Trees §7.1 – Trees TREE –Let T be a relation on a set A. T is a tree if there exists a vertex v 0 in A s.t. there.
 2004 SDU Lecture 7- Minimum Spanning Tree-- Extension 1.Properties of Minimum Spanning Tree 2.Secondary Minimum Spanning Tree 3.Bottleneck.
The Neighbor Joining Tree-Reconstruction Technique Lecture 13 ©Shlomo Moran & Ilan Gronau.
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Lectures on Greedy Algorithms and Dynamic Programming
5.5.2 M inimum spanning trees  Definition 24: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible.
Chapter 8 Maximum Flows: Additional Topics All-Pairs Minimum Value Cut Problem  Given an undirected network G, find minimum value cut for all.
MA/CSSE 473 Day 34 MST details: Kruskal's Algorithm Prim's Algorithm.
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
Great Theoretical Ideas in Computer Science for Some.
1 Assignment #3 is posted: Due Thursday Nov. 15 at the beginning of class. Make sure you are also working on your projects. Come see me if you are unsure.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
. Perfect Phylogeny MLE for Phylogeny Lecture 14 Based on: Setubal&Meidanis 6.2, Durbin et. Al. 8.1.
COMPSCI 102 Introduction to Discrete Mathematics.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Proof of correctness of Dijkstra’s algorithm: Basically, we need to prove two claims. (1)Let S be the set of vertices for which the shortest path from.
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Chapter 5 : Trees.
CSCI2950-C Lecture 7 Molecular Evolution and Phylogeny
dij(T) - the length of a path between leaves i and j
12. Graphs and Trees 2 Summary
COMP 6/4030 ALGORITHMS Prim’s Theorem 10/26/2000.
Lecture 12 Algorithm Analysis
Greedy Algorithms / Minimum Spanning Tree Yin Tat Lee
Enumerating Distances Using Spanners of Bounded Degree
Phylogeny.
Discrete Mathematics for Computer Science
Clustering.
Lecture 12 Algorithm Analysis
Presentation transcript:

. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.

2 Recall: The Four Points Condition Theorem: A set M of L objects is additive iff any subset of four objects can be labeled i,j,k,l so that: d(i,k) + d(j,l) = d(i,l) +d(k,j) ≥ d(i,j) + d(k,l) We call {{i,j},{k,l}} the “split” of {i,j,k,l}. The four point condition doesn’t provides an algorithm to construct a tree from distance matrix, or to decide that there is no such tree (ie, that the set is not additive). The first methods for constructing trees for additive sets used neighbor joining methods:

3 Constructing additive trees: The neighbor joining problem Let i, j be neighboring leaves in a tree, let k be their parent, and let m be any other vertex. The formula shows that we can compute the distances of k to all other leaves. This suggest the following method to construct tree from a distance matrix: 1.Find neighboring leaves i,j in the tree, 2. Replace i,j by their parent k and recursively construct a tree T for the smaller set. 3.Add i,j as children of k in T.

4 Neighbor Finding How can we find from distances alone a pair of nodes which are neighboring leaves? Closest nodes aren’t necessarily neighboring leaves. A B C D Next we show one way to find neighbors from distances.

5 Neighbor Finding: Seitou&Nei method Theorem (Saitou&Nei) Assume all edge weights are positive. If D(i,j) is minimal (among all pairs of leaves), then i and j are neighboring leaves in the tree. i j kl m T1T1 T2T2 The proof is rather involved, and will be skipped.skipped.

6 Saitou&Nei proof (to be skipped) Notations used in the proof p(i,j) = the path from vertex i to vertex j; P(D,C) = (e 1,e 2,e 3 ) = (D,E,F,C) A B C D e1e1 e3e3 e2e2 For a vertex i, and an edge e=(i’,j’): N i (e) = |{k : e is on p(i,k)}|. N D (e 1 ) = 3, N D (e 2 ) = 2, N D (e 3 ) = 1 N C (e 1 ) = 1 EF

7 Saitou&Nei proof i j kl Rest of T

8 Saitou&Nei proof Proof of Theorem: Assume for contradiction that D(i,j) is minimized for i,j which are not neighboring leaves. Let (i,l,...,k,j) be the path from i to j. let T 1 and T 2 be the subtrees rooted at k and l which do not contain edges from P(i,j) (see figure). i j kl T1T1 T2T2 Notation: |T| = #(leaves in T).

9 Saitou&Nei proof Case 1: i or j has a neighboring leaf. WLOG j has a neighbor leaf m. A. D(i,j) - D(m,j)=(L-2)(d(i,j) - d(j,m) ) – (r i +r j ) + (r m + r j ) =(L-2)(d(i,k)-d(k,m) )+r m -r i B. r m -r i ≥ (L-2)(d(k,m)-d(i,l)) + (4-L)d(k,l) i j kl m T2T2 Substituting B in A: D(i,j) - D(m,j) ≥ (L-2)(d(i,k)-d(i,l)) + (4-L)d(k,l) = 2d(k,l) > 0, contradicting the minimality assumption. (since for each edge e  P(k,l), N m (e) ≥ 2 and N i (e)  L-2)

10 Saitou&Nei proof Case 2: Not case 1. Then both T 1 and T 2 contain 2 neighboring leaves. WLOG |T 2 | ≥ |T 1 |. Let n,m be neighboring leaves in T 1. We shall prove that D(m,n) < D(i,j), which will again contradict the minimality assumption. i j k l m n p T1T1 T2T2

11 Saitou&Nei proof i j k l m n p T1T1 T2T2 A. 0 ≤ D(m,n) - D(i,j)= (L-2)(d(m,n) - d(i,j) ) + (r i +r j ) – (r m +r n ) B. r j -r m < (L-2)(d(j,k) – d(m,p)) + (|T 1 |-|T 2 |)d(k,p) C. r i -r n < (L-2)(d(i,k) – d(n,p)) + (|T 1 |-|T 2 |)d(l,p) Adding B and C, noting that d(l,p)>d(k,p): D. (r i +r j ) – (r m +r n ) < (L-2)(d(i,j)-d(n,m)) + 2(|T 1 |-|T 2 |)d(k,p) Substituting D in the right hand side of A: D(m,n ) - D(i,j)< 2(|T 1 |-|T 2 |)d(k,p) ≤ 0, as claimed.

12 A simpler neighbor finding method: Select an arbitrary node r. u For each pair of labeled nodes (i,j) let C(i,j) be defined by the following figure: C(i,j) i j r Claim (from final exam, Winter 02-3): Let i, j be such that C(i,j) is maximized. Then i and j are neighboring leaves.

13 Neighbor Joining Algorithm u Set M to contain all leaves, and select a root r. |M|=L u If L =2, return tree of two vertices Iteration: u Choose i,j such that C(i,j) is maximal u Create new vertex k, and set u remove i,j, and add k to M u Recursively construct a tree on the smaller set, then add i,j as children on k, at distances d(i,k) and d(j,k). i j k m

14 Complexity of Neighbor Joining Algorithm Naive Implementation: Initialization: θ(L 2 ) to compute the C(i,j)’s. Each Iteration: u O(L) to update {C(i,k):i  L} for the new node k. u O(L 2 ) to find the maximal C(i,j). Total of O(L 3 ). i j k m

15 Complexity of Neighbor Joining Algorithm Using Heap to store the C(i,j)’s: Initialization: θ(L 2 ) to compute and heapify the C(i,j)’s. Each Iteration: u O(1) to find the maximal C(i,j). u O(L logL) to delete {C(m,i), C(m,j)} and add C(m,k) for all vertices m. Total of O(L 2 log L). (implementation details are omitted)

16 Ultrametric trees A more recent (and more efficient) way for constructing and identifying additive trees. Idea: Reduce the problem to constructing trees by the “heights” of the internal nodes. For leaves i,j, D(i,j) represent the “height” of the common ancestor of i and j. A E D C B

17 Ultrametric Trees Definition: T is an ultrametric tree for a symmetric positive real matrix D if: 1. The leaves of T correspond to the rows&columns of D 2. Internals nodes have at least two sons, and the Least Common Ancestor of i and j is labeled by D(i,j). 3. The labels decrease along paths from root to leaves ABCDE A08853 B0388 C088 D05 E0 A E D C B

18 We will study later the following question: Given a symmetric positive real matrix D, Is there an ultrametric tree T for D? Centrality of Ultrametric Trees But first we show that algorithm that constructs ultrametric trees from a matrix (or decides that no such tree exists) can be used to construct trees for additive sets and other related problems.

19 Use the labels to define weights for all internal edges in the natural way. For this, consider the labels of leaves to be 0. We get an additive ultrametric tree whose height is the label of the root. E D C B A Transforming Ultrametric Trees to Weighted Trees Note that in this tree all leaves are at the same height. This is why it is called ultrametric.

20 Transforming Weighted Trees to Ultrametric Trees A weighted Tree T can be transformed to an ultrametric tree T’ as follows: Step 1: Pick a node k as a root, and “hang” the tree at k. a b c d a b cd k=a

21 Transforming Weighted Trees to Ultrametric Trees Step 2: Let M = max i d(i,k). M is taken to be the height of T’. Label the root by M, and label each internal node j by M-d(k,j). a b c d a b cd k=a, M=9

22 Transforming Weighted Trees to Ultrametric Trees Step 3: “Stretch” edges of leaves so that they are all at distance M from the root M=9 a b cd (9) (6) (2)(0) a b cd

23 Reconstructing Weighted Trees from Ultrametric Trees M = 9 Weight of an internal edge is the difference between its endpoints. Weights of an edge to leaf i is obtained by substracting M-d(k,i) from its current weight. a b cd 7(-6) (-9) 4(-2) a b cd

24 Solving the Additive Tree Problem by the Ultrametric Problem: Outline We solve the additive tree problem by reducing it to the ultrametric problem as follows: 1.Given an input matrix D = D(i,j) of distances, transform it to a matrix D’= D’(i,j), where D’(i,j) is the height of the LCA of i and j in the corresponding ultrametric tree T’. 2.Construct the ultrametric tree, T’, for D’. 3.Reconstruct the additive tree T from T’.

25 How D’ is constructed from D D’(i,j) should be the height of the Least Common Ancestror of i and j in T’, the ultrametric tree hanged at k: Thus, D’(i,j) = M - d(k,m), where d(k,m) is computed by: a b cd

26 The transformation of D to D’ abcd a999 b77 c4 d abcd a397 b86 c6 d Distance matrix D a b c d Ultrametric matrix D’ a b cd M=9 T T’

27 Identifying Ultrametric Trees Definition: A symmetric matrix D is ultrametric iff for each 3 indices i, j, k D(i,j) ≤ max {D(i,k),D(j,k)}. (ie, there is a tie for the maximum value) Theorem: D has an ultrametric tree iff it is ultrametric Proof: Next lecture.