Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau.
Molecular Evolution Revised 29/12/06
From Ernst Haeckel, 1891 The Tree of Life.  Classical approach considers morphological features  number of legs, lengths of legs, etc.  Modern approach.
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
The Tree of Life From Ernst Haeckel, 1891.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Distance-Based Phylogenetic Reconstruction Tutorial #8 © Ilan Gronau, edited by Itai Sharon.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
. Phylogenetic Trees (2) Lecture 12 Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
Phylogenetic trees Sushmita Roy BMI/CS 576
9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
PHYLOGENETIC TREES Dwyane George February 24,
1 Chapter 7 Building Phylogenetic Trees. 2 Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances –UPGMA method.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
The Neighbor Joining Tree-Reconstruction Technique Lecture 13 ©Shlomo Moran & Ilan Gronau.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Trees Tutorial 5. Agenda How to construct a tree using Neighbor Joining algorithm Phylogeny.fr tool Cool story of the day: Horizontal gene.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Evolutionary tree reconstruction (Chapter 10). Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Tutorial 5 Phylogenetic Trees.
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
598AGB Basics Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT.
Part 9 Phylogenetic Trees
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
dij(T) - the length of a path between leaves i and j
Inferring a phylogeny is an estimation procedure.
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
Hierarchical clustering approaches for high-throughput data
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic Trees.
#30 - Phylogenetics Distance-Based Methods
Lecture 7 – Algorithmic Approaches
Phylogeny.
Presentation transcript:

Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS Colin Dewey Fall 2010

Distance Based Approaches given: an matrix where is the distance between taxa i and j do: build an edge-weighted tree such that the distances between leaves i and j correspond to ABCDE A08853 B0388 C088 D05 E0 AEDBC

Where do we get distances? Commonly obtained from sequence alignments f ij = #mismatches / (#matches + #mismatches) in alignment of sequence i with sequence j dist(i,j) = f ij Or, to correct for multiple substitutions at a single position: dist Jukes-Cantor (i,j) = -3/4 ln(1- 4/3 f ij )

Distance Metrics properties of a distance metric

The Molecular Clock Assumption & Ultrametric Data the molecular clock assumption: divergence of sequences is assumed to occur at the same rate at all points in the tree this assumption is not generally true: selection pressures vary across time periods, organisms, genes within an organism, regions within a gene if it does hold, then the data is said to be ultrametric

The Molecular Clock Assumption & Ultrametric Data ultrametric data: for any triplet of sequences, i, j, k, the distances are either all equal, or two are equal and the remaining one is smaller ABCDE A08853 B0388 C088 D05 E0 AEDBC

The UPGMA Method Unweighted Pair Group Method using Arithmetic Averages given ultrametric data, UPGMA will reconstruct the tree T that is consistent with the data basic idea: –iteratively pick two taxa/clusters and merge them –create new node in tree for merged cluster distance between clusters and of taxa is defined as (avg. distance between pairs of taxa from each cluster)

UPGMA Algorithm assign each taxon to its own cluster define one leaf for each taxon; place it at height 0 while more than two clusters –determine two clusters i, j with smallest –define a new cluster –define a node k with children i and j; place it at height –replace clusters i and j with k –compute distance between k and other clusters join last two clusters, i and j, by root at height

UPGMA given a new cluster formed by merging and we can calculate the distance between and any other cluster as follows

UPGMA Example ABCDE A08853 B0388 C088 D05 E0 AEDBC AEBCD 0885 B038 C08 D0 AEDBC initial state after one merge

UPGMA Example Cont. AEDBC AEBCD AE085 BC08 D0 AEDBC AED08 BC0 AEDBC AEDBC after two merges after three merges final state

Neighbor Joining unlike UPGMA –doesn’t make molecular clock assumption –produces unrooted trees does assume additivity: distance between pair of leaves is sum of lengths of edges connecting them like UPGMA, constructs a tree by iteratively joining subtrees two key differences –how pair of subtrees to be merged is selected on each iteration –how distances are updated after each merge

Picking Pairs of Nodes to Join in NJ at each step, we pick a pair of nodes to join; should we pick a pair with minimal ? suppose the real tree looks like this and we’re picking the first pair of nodes to join? A B C D wrong decision to join A and B: need to consider distance of pair to other leaves

Picking Pairs of Nodes to Join in NJ to avoid this, pick pair to join based on [Saitou & Nei ’87; Studier & Keppler ’88] where L is the set of leaves

Updating Distances in Neighbor Joining given a new internal node k, the distance to another node m is given by: i j k m

Updating Distances in Neighbor Joining can calculate the distance from a leaf to its parent node in the same way i j k m

Updating Distances in Neighbor Joining we can generalize this so that we take into account the distance to all other leaves where and L is the set of leaves this is more robust if data aren’t strictly additive

Neighbor Joining Algorithm define the tree T = set of leaf nodes L = T while more than two subtrees in T –pick the pair i, j in L with minimal –add to T a new node k joining i and j –determine new distances –remove i and j from L and insert k (treat it like a leaf) join two remaining subtrees, i and j with edge of length

Testing for Additivity for every set of four leaves, i, j, k, and l, two of the distances, and must be equal and not less than the third

Rooting Trees finding a root in an unrooted tree is sometimes accomplished by using an outgroup outgroup: a species known to be more distantly related to remaining species than they are to each other edge joining the outgroup to the rest of the tree is best candidate for root position candidate root outgroup

Comments on Distance Based Methods if the given distance data is ultrametric (and these distances represent real distances), then UPGMA will identify the correct tree if the data is additive (and these distances represent real distances), then neighbor joining will identify the correct tree otherwise, the methods may not recover the correct tree, but they may still be reasonable heuristics