. Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau.

Slides:



Advertisements
Similar presentations
PHYLOGENETIC TREES Bulent Moller CSE March 2004.
Advertisements

. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
. Intro to Phylogenetic Trees Lecture 5 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran. Slight modifications by Benny.
פרויקט בתכנות מחקר השוואתי בשחזור עצי אבולוציה: אלגוריתמים קיימים מול תכנות בשלמים אביב 2013 מרצה: שלמה מורן מנחה חיצוני: יוסי שילוח Website:
PLGW01 - September Inferring Phylogenies from LCA distances (back to the basics of distance-based phylogenetic reconstruction) Ilan Gronau Shlomo.
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.
Problem Set 2 Solutions Tree Reconstruction Algorithms
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
BNFO 602 Phylogenetics Usman Roshan.
Distance matrix methods calculate a measure of distance between each pair of species, then find a tree that predicts the observed set of distances.
. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:
Fast Algorithms for Minimum Evolution Richard Desper, NCBI Olivier Gascuel, LIRMM.
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
. Advanced programming Algorithms for reconstructing phylogenetic trees spring 2006 Lecturer: Shlomo Moran, Taub 639, tel 4363 TA: Ilan Gronau,
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #12 © Ilan Gronau.
Linear Least Squares and its applications in distance matrix methods Presented by Shai Berkovich June, 2007 Seminar in Phylogeny, CS Based on the.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.
. Robustness to Noise in Distance-Based Phylogenetic Reconstruction Methods Tutorial #13 © Ilan Gronau.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Phylogenetic Trees Lecture 2
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield (updated April 12, 2009)
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.
CatDogRat Dog3 Rat45 Cow676 Barbara Holland Phylogenetics Workhop, August 2006 Cat Dog Rat Cow Distance Based Methods for estimating phylogenetic.
PHYLOGENETIC TREES Dwyane George February 24,
1 Chapter 7 Building Phylogenetic Trees. 2 Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances –UPGMA method.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Plgw03, 17/12/07 1 On the Hardness of Inferring Phylogenies from Triplet-Dissimilarities Ilan Gronau Shlomo Moran Technion – Israel Institute of Technology.
The Neighbor Joining Tree-Reconstruction Technique Lecture 13 ©Shlomo Moran & Ilan Gronau.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Molecular Evolution.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Evolutionary tree reconstruction (Chapter 10). Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Evolutionary tree reconstruction
WPGMA Input: Distance matrix Dij; Initially each element is a cluster. nr- size of cluster r Find min element Drs in D; merge clusters r,s Delete elts.
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran.
Introduction to bioinformatics 2008 Lecture 12
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
Part 9 Phylogenetic Trees
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics-2 Marek Kimmel (Statistics, Rice)
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Phylogenetic Analysis
Distance based phylogenetics
CSCI2950-C Lecture 7 Molecular Evolution and Phylogeny
dij(T) - the length of a path between leaves i and j
Inferring a phylogeny is an estimation procedure.
Clustering methods Tree building methods for distance-based trees
Phylogenetic Trees.
Lecture 7 – Algorithmic Approaches
Phylogeny.
Perfect Phylogeny Tutorial #10
Presentation transcript:

. Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau

. Phylogenetic Reconstruction We’d like to study the evolutionary history of species

. Distance-Based Reconstruction Given ML pairwise ( evolutionary ) distances between species, find the edge-weighted tree best describing this metric The input: distance matrix – D – D(i,i) = 0 – D(i,j) = D(j,i) – [ D(i,j) ≤ D(i,k) + D(k,j) ] The Output: edge-weighted tree – T If D is additive, then D T = D Otherwise, return a tree best ‘fitting’ the input – D. Note: Usually ML-estimated pairwise distances are not additive, but they are ‘close’ to some additive metric metric BearRaccoonWeaselSealDog Bear Raccoon Weasel Seal Dog Bear Raccoon Weasel Seal Dog

. Neighbor-Joining Algorithms Agglomerative approach: (bottom-up) 1.Find a pair of taxa neighbors – i,j 2.Connect them to a new internal vertex – v (Define edge weights) 3.Remove i,j from taxon-set, and add v (Define distances from v ) 4.Return to (1)  When only 2 taxa are left, connect them Consistency: Given an additive metric D T : - We always choose a pair of neighbors in T (stage 1) - The reduced distance-matrix is consistent with the reduced tree (stage 3) Neighbors: taxa connected by a 2-edge path By induction: We eventually reconstruct T

. UPGMA (U nweighted P air G roup M ethod with A rithmetic-Mean ) UPGMA algorithm: 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) ) 4.Return to (1)  When only 2 taxa are left, connect them Consistency ? - Given an additive metric D T, do we always choose a pair of neighbors in T ? abcd a b 0315 c 014 d 0 c a b d UPGMA chooses b,c Closest taxon is not necessarily a neighbor α, 1- α – proportional to the number of ‘original’ taxa i,j represent

. Ultrametric Trees Edge-weighted trees which have a point (root) equidistant from all leaves Additive metrics consistent with an ultrametric tree are called ultrametrics A distance-matrix is ultrametric iff it obeys the 3-point condition: “ Any subset of three taxa can be labelled i,j,k such that d(i,j) ≤ d(j,k) = d(i,k) ” time

. UPGMA Additional notes: In the reduction formula D(v,k) can be set to any value within the interval defined by D(i,k) and D(j,k).  In particular: D(v,k) = ½(D(i,k) + D(j,k)) ( WPGMA algorithm)  If we use: D(v,k) = min {D(i,k), D(j,k)} we get the ‘closest’ ultrametric from below (unique subdominant ultrametric) Run-time analysis: ―Naïve implementation: Θ(n 3 ) ―By keeping a sorted version of each row in D : Θ(n 2 log(n)) ―Third variant can be executed in: Θ(n 2 ) 1.Find a pair of taxa of minimal distace– i,j 2.Connect them to a new internal vertex v 3.Remove i,j from taxon-set, and add v ( D(v,k) = αD(i,k) +(1- α)D(j,k) )

. Consistent distance-based reconstruction:  Given an additive metric D, find the unique tree T, s.t. D T = T. Reminder: A metric is additive iff it obeys the 4-point condition: “Any subset of four taxa can be labelled i,j,k,l such that d(i,j) + d(k,l) ≤ d(i,l) + d(j,k) = d(i,k) + d(j,l)” Next Time … Distance matrices Additive matrices Ultrametric matrices

. Saitou & Nei’s Neighbor Joining S&N algorithm: 1.Find a pair of taxa maximizing Q(i,j) = r(i) + r(j) – (n-2)D(i,j) 2.Connect them to a new internal vertex v with edges of weights: 3.Remove i,j from taxon-set, and add v - D(v,k) = ½ ( D(i,k) +D(j,k) -D(i,j) ) 4.Return to (1)  When only 2 taxa are left, connect them (with edge of length D(i,j) ) If D is additive (consistent with some tree T ): Q(i,j) is maximized for neighbor-pairs If i,j are neighbors then stages (2,3) are consistent k ij v n – current #taxa shown in class Conclusion: In such a case, given D, NJ returns T

. Saitou & Nei’s Neighbor Joining Complexity analysis Run-time analysis: In each iteration we need to recalculate r(∙) for all taxa Q(∙,∙) values are ‘scrambled’ in each iteration Stage (1) takes O(n 2 ) Total complexity - O(n 3 ) No known way to speed this up significantly S&N algorithm: 1.Find a pair of taxa maximizing Q(i,j) = r(i) + r(j) – (n-2)D(i,j) 2.Connect them to a new internal vertex v with edges of weights: 3.Remove i,j from taxon-set, and add v - D(v,k) = ½ ( D(i,k) +D(j,k) -D(i,j) ) Note: There are consistent reconstruction algorithms which run in O(n 2 ) or even O(n∙log(n)) time.

. S&N’s NJ on Non-Additive Data Example: BearRaccoonWeaselSealDog Bear Raccoon Weasel Seal Dog D: D(B,R) + D(W,S) ; D(B,W) + D(R,S) ; D(B,S) + D(R,W) (68) ; (78) ; (71) D is not additive

. S&N’s NJ Example: 1 st iteration BRWSD B R W S 050 D 0 D: BearDogRaccoonWeaselSealB-D 626 BRWSD B R W S 0198 D 0 Q: BRWSD r :

. S&N’s NJ Example: 2 nd iteration B-DRWS R W 0 S 0 D: BearDogRaccoonWeaselSealB-D 626 B-DRWS R W 0136 S 0 Q: B-DRWS r : B-D-R Calculate difference from old values to new ones

. S&N’s NJ Example: 3 rd iteration B-D-RWS W 044 S 0 D: BearDogRaccoonWeaselSealB-D 626 Q: B-D-RWS r : B-D-R B-D-RWS 091 W 0 S 0 Reconstruct the unique tree over 3 taxa 1.5 W-S

. How Good Is The Tree? BearDogRaccoonWeaselSeal B-D 626 B-D-R W-S We observe the perturbations from the input matrix to the one implied by the output tree BRWSD B R W S 050 D 0 D: BRWSD B R W S D 0 D T : BRWSD B R W S D 0 |D-D T |: How good is this?

. How Good Is The Tree? BearDogRaccoonWeaselSeal B-D 626 B-D-R W-S Compare with other algorithms: BRWSD B R W S D 0 |D-D T2 |: BearRaccoonWeaselSeal Dog BR 13 BRS BRSW BRSWD |D-D T1 |: NJ UPGMA BRWSD B R W S D 0

. Can we do better? Given a distance-matrix D, find an edge-weighted tree T, which minimizes ||D,D T || p For p = 1,2,∞ this task was shown to be NP-hard For p = 1,2 this task was shown to be NP-hard for ultrametric trees as well For p = ∞ : ― this task is easy ( O(n 2 ) algorithm) for ultrametric trees ― 3-approximation algorithm for general trees No algorithm which gives any good guarantees for non-additive data