Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
UPGMA Algorithm.  Main idea: Group the taxa into clusters and repeatedly merge the closest two clusters until one cluster remains  Algorithm  Add a.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Bioinformatics Algorithms and Data Structures
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
07/05/2004 Evolution/Phylogeny Introduction to Bioinformatics MNW2.
Multiple sequence alignment
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Terminology of phylogenetic trees
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Phylogenetic Trees Tutorial 5. Agenda How to construct a tree using Neighbor Joining algorithm Phylogeny.fr tool Cool story of the day: Horizontal gene.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Evolutionary tree reconstruction (Chapter 10). Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Evolutionary tree reconstruction
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Tutorial 5 Phylogenetic Trees.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Lecture 14 CS5661 Neighbor Joining Generates unrooted tree, allowing for unequal branches Given: Distance matrix for sequences Steps: Repeat 1-3 till all.
Inferring a phylogeny is an estimation procedure.
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
The Tree of Life From Ernst Haeckel, 1891.
Phylogenetic Trees.
Lecture 7 – Algorithmic Approaches
Phylogeny.
Lecture 19: Evolution/Phylogeny
Presentation transcript:

Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms

Lecture 13 CS5662 Motivation Life arose just once - “Thou art my brethren but a few sequences removed” Phylogenetic trees = Topologies of evolutionary relationships between sequences, and, possibly, species - “Is it man or cow that is the true heir of the fabulous treasures of the woolly mammoth dynasty?” Phylogenetic tree as guide tree for multiple sequence alignment (déjà vu)

Lecture 13 CS5663 Concepts Mutation and Evolution –Mutations that persist over generations = Evolution Tree, not a lattice –Each species arose just once Species phylogeny (often) != Sequence phylogeny –Sequences evolve at different rates Within a single species Between different species Within a single sequence –Especially in bacteria, horizontal transfer (“Napster’s been around for ages”) quite common

Lecture 13 CS5664 Concepts Molecular clock assumption –Sequences drift apart at a constant rate –Aka edge length proportional to time –Aka satisfaction of ultrametricity For any 3 sequences (all pair-wise distances are equal) xor (2 distances are equal, and the third one smaller) –If true, then All path lengths from root to leaf nodes are equal Additivity –Distance metric chosen is True distance (fulfils triangular inequality) Such that cumulative sum of edge lengths along path between 2 sequences equals the distance between 2 sequences

Lecture 13 CS5665 Concepts Heuristic forays into intractable space Start with pairwise “distances” Path length = Distance (~Evolutionary time) Work from leaves to node to generate tree –(opposite of binary tree generation) “Its easier to be rootless than to be rooted” Binary tree approximation of higher order trees Edges do not imply direct links (Missing links/incomplete data), only a representation of sequence evolution

Lecture 13 CS5666 Algorithms Parsimony (Character-based) Distance based methods –Neighbor joining –UPGMA Maximum Likelihood IIncreasing Sequence Similarity

Lecture 13 CS5667 Algorithms UPGMA (Unweighted pair group method with arithmetic averages) –Caveat if molecular clock not applicable: “If my cousin looks more like me than my brother, he must be my lost brother, and perhaps my brother my cousin?” Neighbor joining –“Give me additive distances, and I shall give thee a tree, even if some sequences morph faster than others” Parsimony –“Its just a bruise, not Kaposi’s sarcoma!” Maximum Likelihood –“Given the facts, Watson, the answer is elementary!”

Lecture 13 CS5668 UPGMA Easiest to use if molecular clock and additivity are valid No. of clusters = no. of sequences = no. of leaf nodes Inter cluster distance = Average pairwise distance {While (no. of clusters > 1) –Connect pair of closest clusters (at distance d) with intermediate node at distance d/2 from each of them} Caveat: Satisfies minimal distance requirement, but may result in spurious topologies – because of constant rate evolution assumption

Lecture 13 CS5669 Parsimony Parsimony (“Miserliness in model space”): Pick the simplest explanation that fits the facts - “If I hear a blood-curdling scream, it’s just one of my sons trying to kill the other – not an invasion by aliens!” Every possible tree evaluated in terms of total number of steps needed to convert each sequence to another –Practical for only a few sequences High percentage of similarity a prerequisite –Neither identical or ‘completely different’ sequence positions useful –Each difference should represent a single step (WYSIWYG) and not a ‘full circle’ or ‘non- shortest route’

Lecture 13 CS56610 Parsimony ………… 1.ACCEFAHIKLKNPR 2.ACCEFGHILLLNPR 3.ACDEFGHIKLINPK 4.AADEFGHILLNNPK * * * 1 C 2 C 3 D 4 D Candidate tree for position 3 C D

Lecture 13 CS56611 Parsimony 3 sets of 3 trees each compared The one with lowest total number of substitutions selected Refinements: –Branch and bound: Abandon a tree if subtree has a higher score than current minimal score tree –Heuristic branch-pattern representatives –Non-boolean costs: Tranversion > transition OR use of amino-acid substitution matrices

Lecture 13 CS56612 Neighbor Joining Generates unrooted tree, allowing for unequal branches Given: Distance matrix for sequences Steps: Repeat 1-3 till all branches generated 1.Take closest sequences i, j 2.Find branch lengths between i and j by treating remaining sequences as composite (c) 1.Calculate average i-C and j-C distances 2.Calculate branch lengths i and j 3.Treat ij as composite sequence now and generate new distance table. 4.Generate multiple trees by starting with different pairs 5.Compare resulting trees in terms of best fit to original distance matrix

Lecture 13 CS56613 Rooting trees Based on a “proxy ancestor” –Include a distant relative (“outgroup”) as the proxy ancestor –Add the outgroup as the last node –Point of attachment of outgroup represents root Diameter center –Place root at center of longest path through tree

Lecture 13 CS56614 Summary Parsimony and ML based approaches computationally intensive – scalability poor Neighbor joining adequate if additivity assumption is valid UPGMA adequate if both molecular clock and additivity assumptions are valid for given set of sequences

Lecture 13 CS56615 Summary Phylogenetics useful to understand sequence evolution Phylogenetics makes sense for –sequences with a high percentage of sequence identity –sequences not subject to ‘selection’ Sequence tree not the same as species tree