Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
An Introduction to Phylogenetic Methods
Multiple Sequence Alignment & Phylogenetic Trees.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
From Ernst Haeckel, 1891 The Tree of Life.  Classical approach considers morphological features  number of legs, lengths of legs, etc.  Modern approach.
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
BNFO 602 Phylogenetics Usman Roshan.
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
Phylogenetic trees Sushmita Roy BMI/CS 576
9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
PHYLOGENETIC TREES Dwyane George February 24,
1 Chapter 7 Building Phylogenetic Trees. 2 Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances –UPGMA method.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Molecular Evolution.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Evolutionary tree reconstruction (Chapter 10). Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships.
Evolutionary tree reconstruction
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
Statistical stuff: models, methods, and performance issues CS 394C September 16, 2013.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Tutorial 5 Phylogenetic Trees.
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Part 9 Phylogenetic Trees
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Fitch-Margoliash Algorithm 1.From the distance matrix find the closest pair, e.g., A & B 2.Treat the rest of the sequences as a single composite sequence.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Distance based phylogenetics
Inferring a phylogeny is an estimation procedure.
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogenetic Trees.
#30 - Phylogenetics Distance-Based Methods
Lecture 7 – Algorithmic Approaches
Phylogeny.
Presentation transcript:

Building Phylogenies Distance-Based Methods

Methods Distance-based Parsimony Maximum likelihood

Distance Matrices a0 b60 c730 d abcd a b c d Distance matrix is additive if there is a tree that fits it exactly

Ultrametric Matrices a0 b20 c660 d abcd a b c d Additive + molecular clock assumption

Methods Fitch - Margoliash UPGMA Neighbor-joining Many others

Least squares trees Minimize over all trees Choice of weights w ij : –Uniform: w ij  1 –Fitch-Margoliash: w ij  1/D ij 2 –Others...

Sarich's (1969) immunological distances

Least squares tree for Sarich’s data

Clustering Methods E.g., UPGMA and Neighbor-Joining A cluster is a set of taxa Interspecies distances translate into intercluster distances Clusters are repeatedly merged –“Closest” clusters merged first –Distances are recomputed after merging

UPGMA Unweighted pair group method using arithmetic averages The distance between clusters C i and C j is After merging C i and C j to create cluster C k define distance from k to every other cluster r as

UPGMA: Initialization 1.Assign each sequence i to its own cluster C i 2.Define one leaf (tip) of tree for each sequence and place it at height 0

UPGMA: Iteration 1.Choose the two clusters i and j with smallest D ij 2.Create a new cluster k, where C k = C i  C j 3.Compute D kr for all r. 4.Define a new node k with children i and j, and place it at height D ij /2. 5.Add k to the current clusters and delete i and j Let i and j be the remaining clusters. Place root at height D ij /2 Repeat until only two clusters remain:

UPGMA Example

UPGMA tree for Sarich’s data

A pitfall of UPGMA The algorithm produces an ultrametric tree: the distance from the root to any leaf is the same UPGMA assumes a constant molecular clock: all species accumulate mutations (evolve) at the same rate.

UPGMA fails when molecular clock assumption doesn’t hold

Neighbor Joining Saitou and Nei, Molecular Biology and Evolution 4 (1987) Idea: Find a pair of leaves that are close to each other but far from other leaves –Implicitly finds a pair of neighboring leaves Advantages: –Works well for additive and other nonadditive matrices –Does not have the molecular clock assumption

Long branches must be handled carefully!      and  are closer to each other than to  or .  Obvious approach produces incorrect clusters!

Compensating for long edges Introduce “correction terms” “Corrected” distances: Distances are reduced for pairs that are far away from all other species: They may be close to each other. Average dist. to other taxa

Neighbor-joining 1.Choose i, j such that D ij  u i  u j is minimum 2.Define a new leaf k whose distances to i and j are 3.Compute the distance from k to every other leaf r 4.Delete i and j Repeat the following until only two leaves remain: Connect the 2 remaining leaves by a branch of length D ij

NJ tree for Sarich’s data

Computing distance matrices Based on sequence alignment Various possibilities: –Distance = average number of differences –Try different PAM matrices; distance = index of matrix that gives highest score –Feng and Doolitle: Based on alignment scores – roughly ratio to max possible score (see text) Read, e.g., PHYLIP documentation: on.edu/phylip/general.html on.edu/phylip/general.html

Distance correction The amount of evolutionary change is not linearly related to time Over a long period of time, a series of substitutions may bring us back to where we started Percentage difference may underestimate evolutionary time

Jukes-Cantor Model

Correcting for multiple substitutions in the JC model

Many other models!