Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.

Slides:

Advertisements

Similar presentations

Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.

An Introduction to Phylogenetic Methods

1 Dan Graur Methods of Tree Reconstruction. 2 3.

Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.

BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.

Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.

1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.

Phylogenetic reconstruction

Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.

Molecular Evolution Revised 29/12/06

Tree Reconstruction.

UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.

. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.

MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002

Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?

In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.

Phylogenetic reconstruction

CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.

Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.

Lecture 24 Inferring molecular phylogeny Distance methods

Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.

Building Phylogenies Parsimony 2.

Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.

Phylogenetic trees Sushmita Roy BMI/CS 576

Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,

Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.

Terminology of phylogenetic trees

Molecular phylogenetics

Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.

Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.

Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)

1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.

COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.

Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.

Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections

Phylogenetic trees School B&I TCD Bioinformatics May 2010.

BINF6201/8201 Molecular phylogenetic methods

Phylogenetics and Coalescence Lab 9 October 24, 2012.

Bioinformatics 2011 Molecular Evolution Revised 29/12/06.

OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.

Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.

Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)

Introduction to Phylogenetics

Calculating branch lengths from distances. ABC A B C----- a b c.

Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.

Fabio Pardi PhD student in Goldman Group European Bioinformatics Institute and University of Cambridge, UK Joint work with: Barbara Holland, Mike Hendy,

Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,

Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.

Phylogeny Ch. 7 & 8.

Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.

Tutorial 5 Phylogenetic Trees.

1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.

Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.

Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.

Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.

CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Phylogenetic basis of systematics

Inferring a phylogeny is an estimation procedure.

Clustering methods Tree building methods for distance-based trees

Multiple Alignment and Phylogenetic Trees

Inferring phylogenetic trees: Distance and maximum likelihood methods

Phylogenetic Trees.

Why Models of Sequence Evolution Matter

#30 - Phylogenetics Distance-Based Methods

Lecture 7 – Algorithmic Approaches

Presentation transcript:

Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate to be optimal (under some criterion) is not guaranteed to be the true tree. A tree produced by a fast algorithm may be just as close to the true tree as the optimal tree. It makes little sense to waste time & resources searching for the optimal tree. In my view, this is true under some limited contexts (e.g., identifying a tree for evaluation of models), but it’s probably a bad idea to make it one’s default position.

UPGMA - Phenetic clustering Steps: 1) Identify the most similar pair of taxa (i.e., that pair with the smallest D i,j ). 2) Merge them into a composite OTU (called i/j), & place each at the end of a branch (D i,u & D j,u ) of length D i,j /2 3) Create a new matrix with the distances to composite taxon i/j calculated as the average of all pairwise comparisons: D k,i/j = (n i D k,i + n j D k,j ) / (n i + n j ), where n i is the number of taxa in the cluster. 4) Iterate until all taxa have been added.

UPGMA - Phenetic clustering OTU-1 OTU-2OTU-3 OTU-4 OTU-5 OTU OTU OTU OTU OTU OTU-1/2OTU-3OTU-4OTU-5 OTU-1/ OTU OTU OTU Italics are average distances.Plain text was taken from original matrix.

UPGMA - Phenetic clustering OTU-1/2OTU-3OTU-4OTU-5 OTU-1/ OTU OTU OTU Again, we erect a new matrix with composite OTU-1/2/5. The same matrix:

UPGMA - Phenetic clustering Now in generating the new matrix, we take the unweighted average of the distances (each pairwise distance among members of the composite OTU is weighted equally). OTU-1/2/5OTU-3OTU-4 OTU-1/2/ OTU OTU D 1/2/5,3 = [(0.255 * 2) ] / 3 = D 1/2/5,4 = [(0.325 * 2) ] / 3 = 0.360

UPGMA - Phenetic clustering OTU-1/2/5 OTU-3/4 OTU-1/2/ OTU-3/ And D 1/2/5,3/4 = [(0.30 * 3) + (0.36 * 3)] / 5 = Thus, the distance from the root to any tip is / 2 = A UPGMA tree will be a good estimate of the phylogeny if, and only if, the evolution of the group has been very clock like (rates don’t change at all along branches) and if distances are perfectly additive

Star Decomposition (e.g., Neighbor Joining)

Star Decomposition – Neighbor Joining This method starts by converting the raw distance matrix to corrected distance matrix. 1. First, for each taxon i, the net divergence, r i, from all other is calculated. 2. The corrected matrix is given by: M i,j = d i,j – [r i / (n - 2)] – [r j / (n - 2)] 3. The minimum M i,j (the shortest corrected distance) identifies the two taxa to be united to an ancestral node (u). Branch lengths are calculated. 4. A new matrix is calculated by replacing taxa i and j with their ancestral node u. 5. If more than two nodes remain, reset n = n – 1 and return to step 1. If only two nodes remain, v i,j = d i,j.

Star Decomposition – Neighbor Joining A worked example using the same matrix we used for UPGMA 1. First, for each taxon i, the net divergence, r i, from all other is calculated. 2. The corrected matrix is given by: M i,j = d i,j – [r i / (n - 2)] – [r j / (n - 2)] OTU-1OTU-2OTU-3OTU-4OTU-5 r i r i /(n-2) The minimum M i,j (the shortest corrected distance) identifies the two taxa to be united to an ancestral node (u).

Star Decomposition – Neighbor Joining 3. This new node u has three branches emanating from it: one to terminal i, one to terminal j, and one to the rest of the tree. v 3,u = d 3,4 / 2 + (r 3 /3 – r 4 /3 ) / 2 = 0.28 / 2 + (0.393 – 0.453) / 2 = 0.11 v 4,u = d 3,4 - v 3,u = 0.28 – 0.11 = 0.17

Star Decomposition – Neighbor Joining 4. A new matrix is generated, with ancestral taxon u replacing i & j (its terminals, 3 & 4). d k,u = ( d i,k + d j,k – d i,j ) /2 3 4 u k d k,3 d k,4 d 3,4 d 1,u = ( d 1,3 + d 1,4 – d 3,4 ) /2 = ( – 0.28) / 2 = 0.12 d 2,u = (d 2,3 + d 2,4 – d 3,4 ) /2 = ( – 0.28) / 2 = 0.18 d 5,u = (d 5,3 + d 5,4 – d 3,4 ) /2 = ( – 0.28) / 2 = 0.27 OTU-1OTU-2OTU-5Node u r i r i /(n – 2) u

Star Decomposition – Neighbor Joining 5. So the star is decomposed further by uniting node u and OTU-1 with an ancestral node w, and the two branch lengths are calculated as follows: v 1,w = d 1,u / 2 + (r 1 /2 – r u /2) / 2 = 0.12 / 2 + (0.260 – 0.285) / 2 = v u,w = d 1,u – v 1,w = 0.12 – = We have three nodes now, w, 2, & 5, so we iterate.

Star Decomposition – Neighbor Joining Set n = 3 & calculate distances from node w to the remaining OTU’s 2 and 5. Just as we did in step 3, we re-calculate the distance matrix: d 2,w = ( d 1,2 + d u,2 – d 1,u ) /2 = ( – 0.12) / 2 = d 5,w = (d 1,5 + d u,5 – d 1,u ) /2 = ( – 0.12) / 2 = OTU-2OTU-5Node w r i r i /(n – 2) w v y,2 = d 2,w / 2 + (r 2 /1 – r w /1) / 2 = / 2 + (0.325 – 0.310) / 2 = v y,w = – = 0.050

Star Decomposition – Neighbor Joining d 5,y = ( d w,5 + d 2,5 – d 2,w ) /2 = ( – 0.115) / 2 = = v 5,y Finally, we calculate the distance from node y to OTU –5:

The UPGMA and NJ trees are different for this matrix. NJ is not phenetic. It is explicitly calculating ancestors and parsing similarity into ancestral vs. derived. Star Decomposition – Neighbor Joining An important modification to the neighbor-joining algorithm, BIONJ incorporates variances and covariances of d i,j The number of calculations decreases as we decompose the star tree. Neighbor joining is often interpreted as an approximation to the ME solution.

Stepwise Addition Start with three taxon tree, compute its score under some criterion (usually an ML or MP), and add a fourth in the manner that incurs the smallest decrease in optimality. Characters A B C D E There are 3 possible places to add taxon D, indicated by numbers 1 – 3. The optimality score is computed for each of these:

Stepwise Addition Option 1 is chosen because it has the shortest in length (i.e., is optimal).

Stepwise Addition Again, characters are optimized on each of these 5 possibilities: Here, we’ve used stepwise addition build a tree(s) using parsimony. The number of calculations increases as we proceed.

Stepwise Addition Addition sequence is critical. The method is starting-point dependent, so different addition sequences can produce different stepwise addition trees. Options: As is – In the order that the taxa appear in the matrix. Shortest – One may check all triplets, choose that which has the shortest tree and, at each step, assess all candidates, and choose that which adds least length. Random – One may use random addition sequences and replicate a number of times. For clean data, there should be only small differences in the stepwise addition tree, but for most real data there will be multiple stepwise addition trees. We’ll make use of this in searching tree space.

Quartet Methods For 5 taxa, there are 5 quartets: {1,2,3,4}{1,2,3,5}{1,2,4,5}{1,3,4,5}{2,3,4,5} ((1,2)3,4) ((1,3)2,4) ((1,4)2,3) ((1,2)3,5) ((1,3)2,5) ((1,5)2,3) ((1,2)4,5) ((1,4)2,5) ((1,5)2,4) ((1,3)4,5) ((1,4)3,5) ((1,5)3,4) ((2,3)4,5) ((2,4)3,5) ((2,5)3,4) There is a final quartet puzzling step that assembles optimal quartets.