Phylogenies and the Tree of Life

Slides:



Advertisements
Similar presentations
Introduction to molecular dating methods. Principles Ultrametricity: All descendants of any node are equidistant from that node For extant species, branches,
Advertisements

Bioinformatics & Algorithmics. Strings. Trees. Trees & Recombination. Structures: RNA. A Mad Algorithm Open Problems.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Phylogenetic Trees Lecture 4
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
“Inferring Phylogenies” Joseph Felsenstein Excellent reference
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Tree Reconstruction Basic Principles of Phylogenetics Distance Parsimony Compatibility Inconsistency Likelihood.
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
Realistic evolutionary models Marjolijn Elsinga & Lars Hemel.
My wish for the project-examination It is expected to be 3 days worth of work. You will be given this in week 8 I would expect 7-10 pages You will be given.
07/05/2004 Evolution/Phylogeny Introduction to Bioinformatics MNW2.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
Ln(7.9* ) –ln(6.2* ) is  2 – distributed with (n-2) degrees of freedom Output from Likelihood Method. Likelihood: 6.2*  = 0.34.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Phylogeny Tree Reconstruction
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Terminology of phylogenetic trees
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetic Tree Reconstruction
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
What is of interest to calculate ? for open problems Semple and Steel.
Evolutionary tree reconstruction
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Phylogeny Ch. 7 & 8.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
MODELLING EVOLUTION TERESA NEEMAN STATISTICAL CONSULTING UNIT ANU.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Biochemistry and Molecular Genetics Computational Bioscience Program Consortium for Comparative Genomics University of Colorado School of Medicine
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
Phylogenetic Trees - Parsimony Tutorial #12
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogenetic basis of systematics
Character-Based Phylogeny Reconstruction
Multiple Alignment and Phylogenetic Trees
Methods of molecular phylogeny
BNFO 602 Phylogenetics Usman Roshan.
CS 581 Tandy Warnow.
CSCI2950-C Lecture 8 Molecular Phylogeny: Parsimony and Likelihood
Phylogeny.
Optimisation Alignment (60 minutes)
Lecture 19: Evolution/Phylogeny
Presentation transcript:

Phylogenies and the Tree of Life Basic Principles of Phylogenetics Parsimony - Distance - Likelihood Topologies - Super Trees - Testing Networks Challenges Empirical Investigations: Molecular Clock Biochemical rates Selection Strength Tree shapes Branching Patterns Rootings Open Questions

Central Principles of Phylogeny Reconstruction TTCAGT TCCAGT GCCAAT Parsimony s2 s1 s4 s3 1 2 Total Weight: 3 s2 s1 s4 s3 1 3 2 3 2 0 0.4 0.6 0.3 0.7 1.5 Distance s2 s1 s4 s3 L=3.1*10-7 Parameter estimates Likelihood

From Distance to Phylogenies What is the relationship of a, b, c, d & e? a b c d e a - 22 10 22 22 b 7 - 22 16 14 c 7 8 - 22 22 d 12 13 9 - 16 e 13 14 10 13 - Molecular clock No Molecular clock a c b d e 7 4 3 2 6 1 11 7 8 5 a c b d e a c b 7 8 b e 14

Enumerating Trees: Unrooted & valency 3 2 1 3 1 2 3 4 1 4 2 3 1 2 3 4 1 2 3 4 5 Recursion: Tn= (2n-5) Tn-1 Initialisation: T1= T2= T3=1 4 5 6 7 8 9 10 15 20 3 105 945 10345 1.4 105 2.0 106 7.9 1012 2.2 1020

Heuristic Searches in Tree Space Nearest Neighbour Interchange T2 T1 T4 T3 T2 T1 T4 T3 T2 T1 T4 T3 Subtree regrafting T4 T3 s4 s5 s6 s1 s2 s3 T4 T3 s4 s5 s6 s1 s2 s3 Subtree rerooting and regrafting T4 T3 s4 s5 s6 s1 s2 s3 T4 T3 s4 s5 s6 s1 s2 s3

Assignment to internal nodes: The simple way. C A T G ? What is the cheapest assignment of nucleotides to internal nodes, given some (symmetric) distance function d(N1,N2)?? If there are k leaves, there are k-2 internal nodes and 4k-2 possible assignments of nucleotides. For k=22, this is more than 1012.

5S RNA Alignment & Phylogeny Hein, 1990 9 11 10 6 8 7 5 4 3 1 2 17 16 15 14 13 12 Transitions 2, transversions 5 Total weight 843. 10 tatt-ctggtgtcccaggcgtagaggaaccacaccgatccatctcgaacttggtggtgaaactctgccgcggt--aaccaatact-cg-gg-gggggccct-gcggaaaaatagctcgatgccagga--ta 17 t--t-ctggtgtcccaggcgtagaggaaccacaccaatccatcccgaacttggtggtgaaactctgctgcggt--ga-cgatact-tg-gg-gggagcccg-atggaaaaatagctcgatgccagga--t- 9 t--t-ctggtgtctcaggcgtggaggaaccacaccaatccatcccgaacttggtggtgaaactctattgcggt--ga-cgatactgta-gg-ggaagcccg-atggaaaaatagctcgacgccagga--t- 14 t----ctggtggccatggcgtagaggaaacaccccatcccataccgaactcggcagttaagctctgctgcgcc--ga-tggtact-tg-gg-gggagcccg-ctgggaaaataggacgctgccag-a--t- 3 t----ctggtgatgatggcggaggggacacacccgttcccataccgaacacggccgttaagccctccagcgcc--aa-tggtact-tgctc-cgcagggag-ccgggagagtaggacgtcgccag-g--c- 11 t----ctggtggcgatggcgaagaggacacacccgttcccataccgaacacggcagttaagctctccagcgcc--ga-tggtact-tg-gg-ggcagtccg-ctgggagagtaggacgctgccag-g--c- 4 t----ctggtggcgatagcgagaaggtcacacccgttcccataccgaacacggaagttaagcttctcagcgcc--ga-tggtagt-ta-gg-ggctgtccc-ctgtgagagtaggacgctgccag-g--c- 15 g----cctgcggccatagcaccgtgaaagcaccccatcccat-ccgaactcggcagttaagcacggttgcgcccaga-tagtact-tg-ggtgggagaccgcctgggaaacctggatgctgcaag-c--t- 8 g----cctacggccatcccaccctggtaacgcccgatctcgt-ctgatctcggaagctaagcagggtcgggcctggt-tagtact-tg-gatgggagacctcctgggaataccgggtgctgtagg-ct-t- 12 g----cctacggccataccaccctgaaagcaccccatcccgt-ccgatctgggaagttaagcagggttgagcccagt-tagtact-tg-gatgggagaccgcctgggaatcctgggtgctgtagg-c--t- 7 g----cttacgaccatatcacgttgaatgcacgccatcccgt-ccgatctggcaagttaagcaacgttgagtccagt-tagtact-tg-gatcggagacggcctgggaatcctggatgttgtaag-c--t- 16 g----cctacggccatagcaccctgaaagcaccccatcccgt-ccgatctgggaagttaagcagggttgcgcccagt-tagtact-tg-ggtgggagaccgcctgggaatcctgggtgctgtagg-c--t- 1 a----tccacggccataggactctgaaagcactgcatcccgt-ccgatctgcaaagttaaccagagtaccgcccagt-tagtacc-ac-ggtgggggaccacgcgggaatcctgggtgctgt-gg-t--t- 18 a----tccacggccataggactctgaaagcaccgcatcccgt-ccgatctgcgaagttaaacagagtaccgcccagt-tagtacc-ac-ggtgggggaccacatgggaatcctgggtgctgt-gg-t--t- 2 a----tccacggccataggactgtgaaagcaccgcatcccgt-ctgatctgcgcagttaaacacagtgccgcctagt-tagtacc-at-ggtgggggaccacatgggaatcctgggtgctgt-gg-t--t- 5 g---tggtgcggtcataccagcgctaatgcaccggatcccat-cagaactccgcagttaagcgcgcttgggccagaa-cagtact-gg-gatgggtgacctcccgggaagtcctggtgccgcacc-c--c- 13 g----ggtgcggtcataccagcgttaatgcaccggatcccat-cagaactccgcagttaagcgcgcttgggccagcc-tagtact-ag-gatgggtgacctcctgggaagtcctgatgctgcacc-c--t- 6 g----ggtgcgatcataccagcgttaatgcaccggatcccat-cagaactccgcagttaagcgcgcttgggttggag-tagtact-ag-gatgggtgacctcctgggaagtcctaatattgcacc-c-tt-

Cost of a history - minimizing over internal states A C G T d(C,G) +wC(left subtree) A C G T A C G T

Cost of a history – leaves (initialisation). A C G T Initialisation: leaves Cost(N)= 0 if N is at leaf, otherwise infinity G A Empty Cost 0 Empty Cost 0

Fitch-Hartigan-Sankoff Algorithm (A,C,G,T) (9,7,7,7) (A, C, G,T) (10,2,10,2) The cost of cheapest tree hanging from this node given there is a “C” at this node (A,C,G,T) * 0 * * (A,C,G,T) * * * 0 (A,C,G,T) * * 0 * 5 A C 2 G T

Felsenstein-Cavendar (1979) The Felsenstein Zone Felsenstein-Cavendar (1979) s3 s1 s2 s4 Reconstructed Tree s4 s3 s2 s1 True Tree Patterns:(16 only 8 shown) 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0 0 0 1 0 1 1 Should be after stoch.proc.

Bootstrapping 500 1 2 ATCTGTAGTCT 10230101201 ATCTGTAGTCT ?????????? Felsenstein (1985) ATCTGTAGTCT 10230101201 500 1 2 3 4 ?????????? ATCTGTAGTCT 1 2 ?????????? 1 3 4 Find example 1 2 3 4

Assignment to internal nodes: The simple way. C A T G ? If branch lengths and evolutionary process is known, what is the probability of nucleotides at the leaves? Cctacggccatacca a ccctgaaagcaccccatcccgt Cttacgaccatatca c cgttgaatgcacgccatcccgt Cctacggccatagca c ccctgaaagcaccccatcccgt Cccacggccatagga c ctctgaaagcactgcatcccgt Tccacggccatagga a ctctgaaagcaccgcatcccgt Ttccacggccatagg c actgtgaaagcaccgcatcccg Tggtgcggtcatacc g agcgctaatgcaccggatccca Ggtgcggtcatacca t gcgttaatgcaccggatcccat

Probability of leaf observations - summing over internal states A C G T P(CG) *PC(left subtree) A C G T A C G T

Output from Likelihood Method. s1 s2 s3 s4 s5 No Molecular Clock 6.9 -/+1.3 11.4 -/+1.9 3.9 -/+0.8 10.9 -/+2.1 9.9 -/+1.2 11.6 -/+2.1 2n-3 lengths estimated 4.1 -/+0.7 s1 s2 s3 s4 s5 Now Duplication Times Amount of Evolution Molecular Clock 23 -/+5.2 12 -/+2.2 11.1 -/+1.8 5.9 -/+1.2 n-1 heights estimated Likelihood: 7.9*10-14   = 0.31 0.18 Likelihood: 6.2*10-12   = 0.34 0.16 ln(7.9*10-14) –ln(6.2*10-12) is 2 – distributed with (n-2) degrees of freedom

The Molecular Clock First noted by Zuckerkandl & Pauling (1964) as an empirical fact. How can one detect it? Known Ancestor, a, at Time t s1 s2 a Unknown Ancestors s1 s2 s3 ??

Rootings Purpose 1) To give time direction in the phylogeny & most ancient point 2) To be able to define concepts such a monophyletic group. 1) Outgrup: Enhance data set with sequence from a species definitely distant to all of them. It will be be joined at the root of the original data 2) Midpoint: Find midpoint of longest path in tree. 3) Assume Molecular Clock.

Rooting the 3 kingdoms 3 billion years ago: no reliable clock - no outgroup Given 2 set of homologous proteins, i.e. MDH & LDH can the archea, prokaria and eukaria be rooted? E P A LDH MDH E P A Root?? E P A LDH/MDH Given 2 set of homologous proteins, i.e. MDH & LDH can the archea, prokaria and eukaria be rooted? E P A LDH/MDH

The generation/year-time clock Langley-Fitch,1973 s1 s3 s2 l2 l1 l3 Absolute Time Clock: s1 s3 s2 {l1 = l2 < l3} l3 Some rooting techniquee l1 = l2 Generation Time Clock: Elephant Mouse 100 Myr Absolute Time Clock Generation Time variable constant

The generation/year-time clock Langley-Fitch,1973 s1 s3 s2 Any Tree Generation Time Clock Can the generation time clock be tested? Assume, a data set: 3 species, 2 sequences each s1 s3 s2 s1 s3 s2

The generation/year-time clock Langley-Fitch,1973 s1 s3 s2 l1 = l2 l3 s1 s3 s2 l2 l1 l3 dg: 2 dg: k-1 k=3: degrees of freedom: 3 k: dg: 2k-3 s1 s3 s2 l2 l1 l3 s1 s3 s2 c*l2 c*l1 c*l3 k=3, t=2: dg=4 k, t: dg =(2k-3)-(t-1)

Fibrinopeptide A phylogeny: & b – globin, cytochrome c, fibrinopeptide A & generation time clock Langley-Fitch,1973 Fibrinopeptide A phylogeny: Human Gorilla Donkey Gibbon Monkey Rabbit Cow Rat Pig Horse Goat Llama Sheep Dog Relative rates a-globin 0.342 – globin 0.452 cytochrome c 0.069 fibrinopeptide A 0.137

Almost Clocks (MJ Sanderson (1997) “A Nonparametric Approach to Estimating Divergence Times in the Absence of Rate Constancy” Mol.Biol.Evol.14.12.1218-31), J.L.Thorne et al. (1998): “Estimating the Rate of Evolution of the Rate of Evolution.” Mol.Biol.Evol. 15(12).1647-57, JP Huelsenbeck et al. (2000) “A compound Poisson Process for Relaxing the Molecular Clock” Genetics 154.1879-92. ) I Smoothing a non-clock tree onto a clock tree (Sanderson) II Rate of Evolution of the rate of Evolution (Thorne et al.). The rate of evolution can change at each bifurcation III Relaxed Molecular Clock (Huelsenbeck et al.). At random points in time, the rate changes by multiplying with random variable (gamma distributed) Comment: Makes perfect sense. Testing no clock versus perfect is choosing between two unrealistic extremes.

Spannoids Advantage: Decomposes large trees into small trees 1 2 3 4 Spanning tree Steiner tree 2 5 4 1 3 6 1-Spannoid 2-Spannoid Advantage: Decomposes large trees into small trees Questions: How to find optimal spannoid? How well do they approximate?

Profiloids and Staroids Profile HMM s1 s2 sk Ideal large phylogeny A phylogeny of profiles - a staroid HMM1 HMM2 HMM3 Questions: Parameter changes on edges relating HMMs Choosing Optimal Staroids