9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.

Slides:



Advertisements
Similar presentations
Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
Advertisements

. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
. Intro to Phylogenetic Trees Lecture 5 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran. Slight modifications by Benny.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Tree Reconstruction.
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Bioinformatics Algorithms and Data Structures
. Perfect Phylogeny Tutorial #11 © Ilan Gronau Original slides by Shlomo Moran.
. Distance-Based Phylogenetic Reconstruction ( part II ) Tutorial #11 © Ilan Gronau.
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
. Phylogenetic Trees Lecture 1 Credits: N. Friedman, D. Geiger, S. Moran,
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Phylogenetic Trees Lecture 2
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield (updated April 12, 2009)
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Gene expression & Clustering (Chapter 10)
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
PHYLOGENETIC TREES Dwyane George February 24,
1 Chapter 7 Building Phylogenetic Trees. 2 Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances –UPGMA method.
BINF6201/8201 Molecular phylogenetic methods
The Neighbor Joining Tree-Reconstruction Technique Lecture 13 ©Shlomo Moran & Ilan Gronau.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Evolutionary tree reconstruction (Chapter 10). Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Evolutionary tree reconstruction
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Phylogenetic Analysis
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Distance based phylogenetics
CSCI2950-C Lecture 7 Molecular Evolution and Phylogeny
dij(T) - the length of a path between leaves i and j
Inferring a phylogeny is an estimation procedure.
Multiple Alignment and Phylogenetic Trees
Hierarchical clustering approaches for high-throughput data
CSE 5290: Algorithms for Bioinformatics Fall 2009
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic Trees.
Lecture 7 – Algorithmic Approaches
Phylogeny.
Computational Genomics Lecture #3a
Presentation transcript:

9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

2 9/1/2005 Introduction – additive trees In the last lecture we saw the concept of distance based phylogenetic trees d(i,j) is the distance between the objects indexed i and j In particular, we discussed additive sets, in which:  For each i: d(i,i) = 0, and for each j  i: d(i,j)  0  For each i,j: d(i,j) = d(j,i)  For each i,j,k: d(i,k) ≤ d(i,j) + d(j,k) [triangle inequality]  Any subset of four objects can be labelled i,j,k,l such that d(i,j) + d(k,l) ≤ d(i,l) + d(j,k) = d(i,k) + d(j,l) [four points condition] An additive set defines a tree. Every tree defines an additive distance matrix between its leaves

3 9/1/2005 Molecular clocks Let us assume that “stable” mutations in the genome occur uniformly over long time periods This defines a “molecular clock” – each mutation stands for a constant period of time We can therefore approximate the time since any two taxa diverged from their last common ancestor by the number of differences between the genomes in conserved regions

4 9/1/2005 Ultrametric trees Given a group of taxa with distances, if we assume the “molecular clock” model and wish to find the evolutionary tree, the number of mutations from the last common ancestor to every taxon should be similar This means that the distance from the root of the evolutionary tree to each leaf is the same Such a tree is called an Ultrametric tree

5 9/1/2005 Ultrametric trees (cont.) If we have a set of objects with a distance between them, we want to know if this set is ultrametric For ultrametric sets, these condition hold:  For each i: d(i,i) = 0, and for each j  i: d(i,j)  0  For each i,j: d(i,j) = d(j,i)  For each i,j,k: d(i,k) ≤ max{d(i,j), d(j,k)} [ultrametric condition] The last condition can be replaced by this one:  Any subset of three objects can be labelled i,j,k such that d(i,j) ≤ d(j,k) = d(i,k)

6 9/1/2005 Ultrametric trees (cont.) An ultrametric set is also additive The opposite is not always true Distance matrices Additive matrices Ultrametric matrices

7 9/1/2005 Ultrametric decision Given a set of n objects with distances, we want to determine if the set is ultrametric The naïve approach – go over all triplets, and check if the ultrametric condition holds Complexity – O(n 3 ) More efficient algorithms exists (Gusfield gives a simple O(n 2 logn) and a more sophisticated O(n 2 ) algorithm with partial proofs)

8 9/1/2005 Approximations However, for most biological data there is no accurate “ultrametric solution” This means that some heuristic is needed The most popular method is UPGMA, which stands for Unweighted Pair Group Method using Arithmetic mean Introduced by Sokal and Michener (1958)

9 9/1/2005 UPGMA Input: A set of n objects, with a distance between every two objects Output: an ultrametric tree with the given objects as leaves The main data structures used by the algorithm are a graph G=(V,E) which contains trees with the objects as leaves, and a distance matrix between each two roots of trees in the graph

10 9/1/2005 UPGMA (cont.) Initialization: Each object in a separate tree, distance by input We will use an example of 5 mammal species BearRaccoonWeaselSealDog Bear Raccoon Weasel Seal Dog BearRaccoonWeaselSealDog

11 9/1/2005 UPGMA (cont.) We iterate until there is only one tree At each iteration we perform:  Find the two trees x and y with minimal distance d(x,y)  Add a new node, and connect the roots of x and y to this node. The result is a new tree z. The height of the root of z is d(x,y)/2  Compute the distance between z and the other remaining trees (without x and y)

12 9/1/2005 UPGMA (cont.) First iteration: BearRaccoonWeaselSealDog Bear Raccoon Weasel Seal Dog BearRaccoonWeaselSealSea lionBR 13

9/1/2005 UPGMA (cont.) Update computation – denote the number of leaves in the tree x by n x, then for each t  x,y we set: BearRaccoonWeaselSealDog Bear Raccoon Weasel Seal Dog BRWeaselSealDog BR Weasel Seal Dog

14 9/1/2005 UPGMA (cont.) Second iteration: BearRaccoonWeaselSealDog BR 13 BRWeaselSealDog BR Weasel Seal Dog BRS =5.25

15 9/1/2005 UPGMA (cont.) Third iteration: BRSWeaselDog BRS Weasel Dog BearRaccoonWeaselSealDog BR 13 BRS =5.25 BRSW =1.75

16 9/1/2005 UPGMA (cont.) Fourth (and last) iteration: BRSWDog BRSW Dog BearRaccoonWeaselSealDog BR 13 BRS =5.25 BRSW =1.75 BRSWD =2.625

17 9/1/2005 UPGMA - complexity A simple implementation takes n-1 iterations, where in each iteration we find the minimal distance at O(n 2 ), with total complexity of O(n 3 ) We can keep a list of the smallest distance in each row. This way it takes O(n) to find the minimal distance, while updating the list is also O(n) at each iteration. Therefore, the total complexity is O(n 2 ).

18 9/1/2005 Ultrametric evaluation UPGMA gives us an ultrametric tree Is this tree the best possible? Depends on how we measure the quality of an approximated tree for a given matrix Let U(i,j) be the distance in the ultrametric tree U between the objects indexed i and j The L  norm is defined by:

19 9/1/2005 Ultrametric evaluation (cont.) There is an O(n 2 ) algorithm for finding the ultrametric tree U with minimal L  norm (Farach, Kannan and Warnow, 1995) Is this tree the best possible? It would be better to include all distances The L 1 norm is defined by: Finding U with minimal L 1 norm is NP-hard! (Day, 1987)