Cao et al. (2000) Gene Phylogeny of Mammals a good example where molecular sequences have led to a big improvement of our understanding of evolution.

Slides:



Advertisements
Similar presentations
Introduction to Molecular Evolution
Advertisements

Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Multiple Sequence Alignment & Phylogenetic Trees.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
BME 130 – Genomes Lecture 26 Molecular phylogenies I.
Phylogenetic reconstruction
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
We have shown that: To see what this means in the long run let α=.001 and graph p:
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
07/05/2004 Evolution/Phylogeny Introduction to Bioinformatics MNW2.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny Tree Reconstruction
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Terminology of phylogenetic trees
Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Day 8,9 Carlow Bioinformatics Phylogenetic inferences Trees.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Chapter 8 Molecular Phylogenetics: Measuring Evolution.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.
Calculating branch lengths from distances. ABC A B C----- a b c.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Rooting Phylogenetic Trees with Non-reversible Substitution Models Von Bing Yap* and Terry Speed § *Statistics and Applied Probability, National University.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylogenetic basis of systematics
Distance based phylogenetics
Inferring a phylogeny is an estimation procedure.
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogenetic Trees.
Molecular Evolution.
The Most General Markov Substitution Model on an Unrooted Tree
Phylogeny.
Phylogenetic tree based on 16S rRNA gene sequence comparisons over 1,260 aligned bases showing the relationship between species of the genus Actinomyces.
Phylogenetic tree based on predominant 16S rRNA gene sequences obtained by C4–V8 Sutterella PCR from AUT-GI patients, Sutterella species isolates, and.
Presentation transcript:

Cao et al. (2000) Gene Phylogeny of Mammals a good example where molecular sequences have led to a big improvement of our understanding of evolution.

rRNA structure is conserved in evolution. The sequence changes slowly. Therefore it can be used to tell us about the earliest branches in the tree of life.

69 Mammals with complete motochondrial genomes. Used two models simulatneously Total of 3571 sites = 1637 single sites pairs Hudelot et al. 2003

Afrotheria / Laurasiatheria

Phylogenetic Methods Distances Clustering Methods Likelihood Methods Parsimony Recommended books: P. G. Higgs and T. K. Attwood – Bioinformatics and Molecular Evolution R. Page and A. Holmes – Molecular evolution: a phylogenetic approach W. H. Li – Molecular evolution

Part of sequence alignment of Mitochondrial Small Sub-Unit rRNA Full gene is length ~ Primate species with mouse as outgroup

From alignment construct pairwise distances. Species 1:AAGTCTTAGCGCGAT Species 2:ACGTCGTATCGCGAT * * * D = 3/15 = 0.2 D = fraction of differences between sequences BUT - D is not an additive distance, D does not increase linearly with time. G CA 2 substitutions happened - only 1 is visible G AA 2 substitutions happened - nothing visible

Models of Sequence Evolution P ij (t) = probability of being in state j at time t given that ancestor was in state i at time 0. States label bases A,C,G & T i t j r ij is the rate of substitution from state i to state j

Jukes - Cantor Model All substitution rates = All base frequencies are 1/4 t = 2t Mean number of substitutions per site: d increases linearly with time d = D D = 3/4 D d

The HKY model has a more general substitution rate matrix to from The frequencies of the four bases are is the transition-transversion rate parameter * means minus the sum of elements on the row

Baboon Gibbon Orang Gorilla PygmyCh. Chimp Human Baboon Gibbon Orang Gorilla PygmyChimp Chimp Human Part of the Jukes-Cantor Distance Matrix for the Primates example Use as input to clustering methods Mouse-Primates ~ 0.3

Distance Matrix Methods Follow a clustering procedure on the distance matrix: 1. Join closest 2 clusters 2. Recalculate distances between all the clusters 3. Repeat 1 and 2 until all species are connected in a single cluster. Initially each species is a cluster on its own. The clusters get bigger during the process, until they are all connected to a single tree. Neighbour Joining method is commonly used clustering method

Neighbour-Joining method Take two neighbouring nodes i and j and replace by a single new node n. d in + d nk = d ik ;d jn + d nk = d jk ;d in + d jn = d ij ; therefore d nk = (d ik + d jk - d ij )/2 ; applies for every k define k i j n n Let d in = (d ij + r i - r j )/2 ;d jn = (d ij + r j - r i )/2. Rule: choose i and j for which D ij is smallest, where D ij = d ij - r i - r j. but...

NJ method produces an Unrooted, Additive Tree Additive means distance between species = distance summed along internal branches 0.1 Baboon Mouse Lemur Tarsier SakiMonkey Marmoset Gibbon Orangutan GorillaPygmyChimp Chimp Human 0.1 Mouse Lemur Tarsier 47 SakiMonkey Marmoset 100 Baboon Gibbon Orang Gorilla Human PygmyChimp Chimp The tree has been rooted using the Mouse as outgroup

The Maximum Likelihood Criterion Calculate the likelihood of observing the data on a given the tree. Choose tree for which the likelihood is the highest. x A G t2t2 t1t1 x y z t2t2 t1t1 Can calculate total likelihood for the site recursively. Likelihood is a function of tree topology, branch lengths, and parameters in the substitution rate matrix. All of these can be optimized.

Tree log L difference S.E. Significantly worse no NJ tree < best tree (Human,(Gorilla,(Chimp,PygmyCh))) no((Human,Gorilla),(Chimp,PygmyCh)) no(Lemur,(Tarsier,Other Primates)) no(Tarsier,(Lemur,Other Primates)) ???????????????????????????????????????????????????????????????????? yes(Gorilla,(Chimp,(Human,PygmyCh))) noOrang and Gorilla form clade noGorilla branches earlier than Orang yesGibbon and Baboon form a clade Using ML to rank the alternative trees for the primates example 0.1 Mouse Lemur Tarsier 47 SakiMonkey Marmoset 100 Baboon Gibbon Orang Gorilla Human PygmyChimp Chimp The NJ Tree has: (Gorilla,(Human,(Chimp,PygmyCh))) ((Lemur,Tarsier),Other Primates))

The Parsimony Criterion Try to explain the data in the simplest possible way with the fewest arbitrary assumptions Used initially with morphological characters. Suppose C and D possess a character (1) that is absent in A and B (0) A(0) B(0) C(1) D(1) A(0) C(1) B(0) D(1) + * 3 ++ In 1, the character evolves only once (+) In 2, the character evolves once (+) and is lost once (*) In 3, the character evolves twice independently The first is the simplest explanation, therefore tree 1 is to be preferred by the parsimony criterion.

Parsimony with molecular data A(T) B(T) C(T) D(G) E(G) * A(T) B(T) C(T) D(G) E(G) * * 1. Requires one mutation 2. Requires two mutations By parsimony, 1 is to be preferred to 2. A(T) B(T) C(T) D(T) E(G) * A(T) B(T) C(T) D(T) E(G) * This site is non-informative. Whatever the arrangement of species, only one mutation is required. To be informative, a site must have at least two bases present at least twice. The best tree is the one that minimizes the overall number of mutations at all sites.

Searching Tree Space Require a way of generating trees to be tested by Maximum Likelihood, or Parsimony. Nearest neighbour interchange Subtree pruning and regrafting No. of distinct tree topologies NUnrooted (U N ) Rooted (R N ) U N = (2N-5)U N-1 R N = (2N-3)R N-1 Conclusion: there are huge numbers of trees, even for relatively small numbers of species. Therefore you cannot look at them all.

Group III - Supraprimates

Dating from phylogenies The molecular clock must be approx 50 m.y. must be approx 150 m.y. known to be 100 m.y. Can lead to controversies because the clock does not always go at a constant rate

Mammalian orders (primates, rodents, carnivores, bats....) Molecular dates tend to be earlier (100 m.y.) than those coming from the fossil record (65 m.y.)

Animal phyla - Molecular phylogenies are resolving the relationships between phyla that were not understood from morphology. Still uncertainty about dates - Molecular dates suggest about 1 b.y. Earliest fossil evidence is 560 m.y. Old morphological phylogeny New molecular phylogeny

Puts LUCA just prior to 4, and origin of eukaryotes at 2.7. Origin of cyanobacteria is at 2.5, whereas they are claimed to be present in the fossil record at 3.5. Molecular dating of key events in early evolution