Inferring phylogenetic trees: Distance and maximum likelihood methods

Slides:



Advertisements
Similar presentations
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
An Introduction to Phylogenetic Methods
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Phylogenetic Trees Lecture 4
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Inferring phylogenetic trees: Maximum likelihood methods Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Calculating branch lengths from distances. ABC A B C----- a b c.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Evolutionary tree reconstruction
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Phylogenetic basis of systematics
Inferring phylogenetic trees: Distance methods
Distance based phylogenetics
Inferring a phylogeny is an estimation procedure.
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic Trees.
#30 - Phylogenetics Distance-Based Methods
Lecture 7 – Algorithmic Approaches
Presentation transcript:

Inferring phylogenetic trees: Distance and maximum likelihood methods GENOME 373: Genomic Informatics Prof. William Stafford Noble

Outline Distance methods Maximum likelihood Fitch-Margoliash Neighbor joining UPGMA Maximum likelihood

One-minute responses Is the parsimony model biologically accurate? No. Parsimony ignores back-mutation, parallel mutation, etc. The following tree can have a score of 2 or 3, correct? Correct. However, the idea of parsimony is to select the tree with the smallest number of mutations along the tree. Is it biologically acceptable to make the assumptions of the JC model? No. The assumptions are made for statistical reasons – essentially, we often don’t know the proper values for the more parameter-rich models. What other considerations can be taken to get a better tree? The most important ones are site-by-site variation in mutation rate, and dependencies between adjacent sites. Is there any way to check whether the tree obtained is significant? You can check whether individual branches are significant using something called “bootstrap analysis.” Still unclear how to use these trees in a biological way. Primarily, these trees are used to understand evolutionary history. Will we be using any of the phylogeny software in this class? No.

One-minute responses What’s a real event that is your “oracle” that tells you the true evolutionary history of substitutions for Jukes-Cantor? There is no oracle, and luckily, you don’t need one in order for Jukes-Cantor to work. It was difficult to understand how you were computing parsimony scores at first.

Distance methods Fitch-Margoliash Neighbor-joining UPGMA Multiple sequence alignment Pairwise distance matrix Phylo- genetic tree

Star topology Sum of all branches is S*=a+b+c+d+e. Summing all distances in the matrix counts each edge four times (e.g., dAB, dAC, dAD and dAE). Hence, the sum of all distances in the matrix is 4S*.

Adding one branch Sum of branches is S = a + b + c + d + e + f = (dAC + dAD + dAE + dBC + dBD + dBE)/6 + dAB/2 + (dCD + dCE + dDE)/3

Neighbor joining Add one branch to the star topology and compute the difference between S* and S. Repeat for each pair of leaves in the tree. Choose the pair that yields the largest difference (the closest neighbors). Join that pair. Repeat until all pairs are joined.

UPGMA Unweighted pair group method with arithmetic mean. Also known as agglomerative hierarchical clustering. Basic idea: iteratively connect the two most closely related sequences.

UPGMA Scer Spar Smik Sbay Skud Scas Sklu 30 40 32 323 253 31 26 17 201 30 40 32 323 253 31 26 17 201 229 25 35 290 219 298 227 316 243 322 300 315 95 226

UPGMA Find the smallest off-diagonal element in the matrix. Scer Spar Smik Sbay Skud Scas Sklu 30 40 32 323 253 31 26 17 201 229 25 35 290 219 37 298 227 316 243 322 300 315 95 226 Find the smallest off-diagonal element in the matrix.

UPGMA Compute the average between the two rows and columns. Scer Spar Smik Sbay Skud Scas Sklu 30 40 32 323 253 31 26 17 201 229 25 35 290 219 37 298 227 316 243 322 300 315 95 226 Compute the average between the two rows and columns.

UPGMA Scer Spar Smik Sbay Skud Scas Sklu 30 36 323 253 31 21.5 201 229 30 36 323 253 31 21.5 201 229 31.5 32.5 294 222.5 316 243 322 300 315 95

UPGMA Each merger creates a subtree. Smik Sbay Scer Spar Smik-Sbay Skud Scas Sklu 30 36 323 253 31 21.5 201 229 31.5 32.5 294 222.5 316 243 322 300 315 95 Smik Sbay Each merger creates a subtree.

Maximum likelihood for each possible tree for each column of the alignment compute the likelihood of the column, given the tree return the tree with the highest likelihood Similar to parsimony, but capable of using a model of evolution. Computationally expensive. DNAML is the Phylip program for maximum likelihood. FastDNAML is a fast clone (http://geta.life.uiuc.edu/~gary/programs/fastDNAml.html).

Computing the likelihood ACGCGTTGGG ACGCAATGAA ACACAGGGAA + Pr(column|tree,model) T T A G What is the probability of observing this column, given this tree and an assumed model of evolution?

Computing the likelihood A C G A A A A A A T T T T A G T A G T A G Solution: Enumerate all possible assignments to the internal nodes. Compute the probability of each tree, and sum.

Computing the likelihood ACGCGTTGGG ACGCAATGAA ACACAGGGAA + A Pr(column|tree,model) T A T T A G What is the probability of observing this column, given this assigned tree and an assumed model of evolution?

Computing the likelihood The probability of observing a substitution from A to T on a branch of length m is given by the evolutionary model. πA, πC, πG, πT The probability of the ancestral observation being A is just πA. A m T A T T A G

Computing the likelihood πA, πC, πG, πT L0 A L1 L2 T A L5 L3 L4 L6 T T A G The desired probability is the product of the probabilities of the branches. L(tree) = L0  L1  L2  L3  L4  L5  L6

Computing the likelihood A C G A A A A A A T T T T A G T A G T A G tree1 tree2 tree3 The probability of the tree is the sum of the probabilities of the individual trees. L(tree) = L(tree1) + L(tree2) + L(tree3) + …

Maximum likelihood revisited for each possible tree for each column of the alignment for each assignment of internal nodes for each branch compute the probability of that branch assigned tree probability ← multiply branch probabilities column probability ← sum assigned tree probabilities tree probability ← multiply column probabilities return the tree with the highest probability

Maximum likelihood revisited for each possible tree for each column of the alignment for each assignment of internal nodes for each branch compute the probability of that branch assigned tree probability ← multiply branch probabilities column probability ← sum assigned tree probabilities tree probability ← multiply column probabilities return the tree with the highest probability Multiply probabilities of independent events. Add probabilities of mutually exclusive events.

Overview Parsimony Distance methods Maximum likelihood Computing distances Finding the tree Fitch-Margoliash Neighbor-joining UPGMA Maximum likelihood