#31 - Phylogenetics Character-Based Methods

Slides:



Advertisements
Similar presentations
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

An Introduction to Phylogenetic Methods
BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
© Wiley Publishing All Rights Reserved. Phylogeny.
“Inferring Phylogenies” Joseph Felsenstein Excellent reference
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Bioinformatics and Phylogenetic Analysis
Realistic evolutionary models Marjolijn Elsinga & Lars Hemel.
NJ was originally described as a method for approximating a tree that minimizes the sum of least- squares branch lengths – the minimum – evolution criterion.
Lecture 24 Inferring molecular phylogeny Distance methods
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Parsimony Anders Gorm Pedersen
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 2.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Tree-Building. Methods in Tree Building Phylogenetic trees can be constructed by: clustering method optimality method.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Terminology of phylogenetic trees
BINF6201/8201 Molecular phylogenetic methods
Molecular phylogenetics
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
A brief introduction to phylogenetics
Lecture 2: Principles of Phylogenetics
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
Calculating branch lengths from distances. ABC A B C----- a b c.
Evolutionary tree reconstruction
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
111/07/07BCB 444/544 F07 ISU Terribilini #32- Machine Learning BCB 444/544 Lecture 32 Machine Learning #32_Nov07.
Phylogenetic Trees - Parsimony Tutorial #12
Phylogenetic basis of systematics
Distance based phylogenetics
Phylogenetic Inference
Multiple Alignment and Phylogenetic Trees
Methods of molecular phylogeny
Inferring phylogenetic trees: Distance and maximum likelihood methods
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
#30 - Phylogenetics Distance-Based Methods
Lecture 7 – Algorithmic Approaches
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

#31 - Phylogenetics Character-Based Methods BCB 444/544 11/05/07 Lecture 31 Phylogenetics – Character-Based Methods #31_Nov05 BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods BCB 444/544 Fall 07 Dobbs

Required Reading (before lecture) #31 - Phylogenetics Character-Based Methods Required Reading (before lecture) 11/05/07 Fri Oct 30 - Lecture 30 Phylogenetic – Distance-Based Methods Chp 11 - pp 142 – 169 Mon Nov 5 - Lecture 31 Phylogenetics – Parsimony and ML Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33 Functional and Comparative Genomics Chp 17 and Chp 18 BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods BCB 444/544 Fall 07 Dobbs

Assignments & Announcements #31 - Phylogenetics Character-Based Methods Assignments & Announcements 11/05/07 Mon Oct 29 - HW#5 HW#5 = Hands-on exercises with phylogenetics and tree-building software Due: Mon Nov 5 (not Fri Nov 1 as previously posted) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods BCB 444/544 Fall 07 Dobbs

BCB 544 Only: New Homework Assignment #31 - Phylogenetics Character-Based Methods 11/05/07 BCB 544 Only: New Homework Assignment 544 Extra#2 Due: √PART 1 - ASAP PART 2 - meeting prior to 5 PM Fri Nov 2 Part 1 - Brief outline of Project, email to Drena & Michael after response/approval, then: Part 2 - More detailed outline of project Read a few papers and summarize status of problem Schedule meeting with Drena & Michael to discuss ideas BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods BCB 444/544 Fall 07 Dobbs

#31 - Phylogenetics Character-Based Methods 11/05/07 Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html Nov 7 Wed - BBMB Seminar 4:10 in 1414 MBB Sharon Roth Dent MD Anderson Cancer Center Role of chromatin and chromatin modifying proteins in regulating gene expression Nov 8 Thurs - BBMB Seminar 4:10 in 1414 MBB Jianzhi George Zhang U. Michigan Evolution of new functions for proteins Nov 9 Fri - BCB Faculty Seminar 2:10 in 102 SciI Amy Andreotti ISU Something about NMR BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods BCB 444/544 Fall 07 Dobbs

Chp 11 – Phylogenetic Tree Construction Methods and Programs #31 - Phylogenetics Character-Based Methods 11/05/07 Chp 11 – Phylogenetic Tree Construction Methods and Programs SECTION IV MOLECULAR PHYLOGENETICS Xiong: Chp 11 Phylogenetic Tree Construction Methods and Programs Distance-Based Methods Character-Based Methods Phylogenetic Tree Evaluation Phylogenetic Programs BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods BCB 444/544 Fall 07 Dobbs

Two main categories of tree building methods Distance-based Tree Construction Two main categories of tree building methods Distance-based Overall similarity between sequences Character-based Consider the entire MSA BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Summary of Distance-Based Methods Clustering-based methods: Computationally very fast and can handle large datasets that other methods cannot Not guaranteed to find the best tree Optimality-based methods: Better overall accuracies Computationally slow All distance-based methods lose all sequence information and cannot infer the most likely state at an internal node BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Character-Based Methods Based directly on the sequence characters in the MSA rather than overall distances Count mutational events accumulated on sequences Evolutionary dynamics of each character can be studied and ancestral sequences inferred Two popular approaches Parsimony Maximum Likelihood (ML) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Parsimony Parsimony is based on Occam’s Razor – the simplest explanation is most likely correct Goal: Find the tree that allows evolution of the sequences with the fewest changes BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Parsimony Parsimony score of a tree: The smallest (weighted) number of steps required by the tree Two parsimony problems: Large Parsimony problem: Find the tree with the lowest parsimony score Small Parsimony problem: Given a tree, find its parsimony score Use the small parsimony problem to solve the large parsimony problem BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Algorithms for Small Parsimony Fitch’s algorithm: Based on set operations Evolutionary steps have the same weight Sankoff’s algorithm: Based on dynamic programming Allows steps to have different weights Both algorithms compute the minimum (weighted) number of steps a tree requires at a given site BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Fitch’s Algorithm Example BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Allows for different weights for different evolutionary steps Sankoff’s Algorithm Allows for different weights for different evolutionary steps Transitions (A <-> G or C <-> T) are more probable than transversions, so give a lower weight to transitions BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Sankoff’s Algorithm Example BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Sankoff’s Algorithm Traceback BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Searching for a Most Parsimonious Tree Solving the large parsimony problem requires searching all possible trees (or does it?) Exhaustive search (exact) Branch-and-Bound (exact) Heuristic search methods (not exact) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Try all possible places to add the fourth taxon and score each tree Exhaustive Search Build the only possible unrooted tree for three taxa (can be randomly chosen) Try all possible places to add the fourth taxon and score each tree Try all places to add the fifth taxon to the trees and score again … BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Why Finding a True Tree is Difficult Number of rooted trees The number of possible trees grows exponentially with the number of species (or sequences) Nr = (2n -3)!/2(n-2)(n-2)! Nu = (2n -5)!/2(n-3)(n-3)! To find the best tree, you must explore all possibilities (or must you?) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Adding the Fourth Taxon BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Adding the Fifth Taxon BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Branch and Bound Similar to exhaustive search except that we maintain the score of best tree obtained so far If score of current tree exceeds the current best score, backtrack and take next available path Main idea: The parsimony score of a tree can only increase as we add another taxa BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Branch and Bound When a tip of the search tree is reached the tree is either optimal (and retained) or suboptimal (and rejected) When all paths leading from the initial 3 taxon tree have been explored, the algorithm terminates, and all most parsimonious trees will have been identified BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Branch and Bound BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

One way to find a reasonable lower bound quickly: Branch and Bound One way to find a reasonable lower bound quickly: Use UPGMA or NJ to build a complete tree Calculate the parsimony score of this tree and use it as a lower bound in our search BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Shortcuts have been designed to reduce the search space Heuristic Search Shortcuts have been designed to reduce the search space Idea: Build a tree quickly (by NJ or some other fast method) and rearrange parts of it to explore some of the possible trees Branch swapping Nearest neighbor interchange Subtree pruning and regrafting Tree bisection and reconnection BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Nearest-Neighbor Interchange BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Subtree Pruning and Regrafting BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Tree Bisection and Reconnection BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Stepwise Addition – Another Heuristic A greedy method Start with 3 taxon tree Add one taxon at a time Keep only the best tree found so far No guarantee of optimality, but may provide a good starting point for a search BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Maximum Likelihood Method ML is based on a Markov model of evolution Observed: The species labeling the leaves Hidden: The ancestral states Transition probabilities: The mutation probabilities Assumptions: Only mutations are allowed Sites are independent BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Models of Evolution at a Site Transition probability matrix: M = [mij], i,j {A,C,T,G} Where mij = Prob(i -> j mutation in 1 time unit) Branches may have different lengths BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

The Probability of an Assignment C T Probability = mTG · mGA · mGG · mTT · mTC · mTT BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Ancestral Reconstruction: Most Likely Assignment X Y Z A G C T L* = maxX,Y,Z {mXY · mYA · mYG · mXZ · mZC · mZT} Compute using Viterbi algorithm BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

L* = X,Y,Z {mXY · mYA · mYG · mXZ · mZC · mZT} Likelihood of a Tree X Y Z A G C T L* = X,Y,Z {mXY · mYA · mYG · mXZ · mZC · mZT} Compute using forward algorithm BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Maximum Likelihood Comments ML is robust ML converges to the correct answer as more data is added Can put in a Bayesian statistical framework to obtain a distribution of possible phylogenies ML can be slow BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Phylogenetic Tree Evaluation Bootstrapping Jackknifing Bayesian Simulation Statistical difference tests (are two trees significantly different?) Kishino-Hasegawa Test (paired t-test) Shimodaira-Hasegawa Test (χ2 test) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Bootstrapping A bootstrap sample is obtained by sampling sites randomly with replacement Obtain a data matrix with same number of taxa and number of characters as original one Construct trees for samples For each branch in original tree, compute fraction of bootstrap samples in which that branch appears Assigns a bootstrap support value to each branch Idea: If a grouping has a lot of support, it will be supported by at least some positions in most of the bootstrap samples BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Bootstrapping Comments Bootstrapping doesn’t really assess the accuracy of a tree, only indicates the consistency of the data To get reliable statistics, bootstrapping needs to be done on your tree 500 – 1000 times, this is a big problem if your tree took a few days to construct BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Another resampling technique Jackknifing Another resampling technique Randomly delete half of the sites in the dataset Construct new tree with this smaller dataset, see how often taxa are grouped Advantage – sites aren’t duplicated Disadvantage – again really only measuring consistency of the data BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Bayesian Simulation Using a Bayesian ML method to produce a tree automatically calculates the probability of many trees during the search Most trees sampled in the Bayesian ML search are near an optimal tree BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Phylogenetic Programs Huge list at: http://evolution.genetics.washington.edu/phylip/software.html PAUP* - one of the most popular programs, commercial, Mac and Unix only, nice user interface PHYLIP – free, multiplatform, a bit difficult to use but web servers make it easier WebPhylip – another interface for PHYLIP online BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Phylogenetic Programs TREE-PUZZLE – uses a heuristic to allow ML on large datasets, also available as a web server PHYML – web based, uses genetic algorithm MrBayes – Bayesian program, fast and can handle large datasets, multiplatform download BAMBE – web based Bayesian program BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

Final Comments on Phylogenetics No method is perfect Different methods make very different assumptions If multiple methods using different assumptions come up with similar results, we should trust the results more than any single method BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods