MCB 372 #12: Tree, Quartets and Supermatrix Approaches Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre.

Slides:



Advertisements
Similar presentations
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Advertisements

Escherichia coli, strain CFT073, uropathogenic Escherichia coli, strain EDL933, enterohemorrhagic Escherichia coli K12, strain MG1655, laboratory strain,
ATPase dataset -> nj in figtree. ATPase dataset -> muscle -> phyml (with ASRV)– re-rooted.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
A Web Interface to analyse SOM of Bipartitions of Gene Phylogenies - A Walk Through J. Peter Gogarten, Maria Poptsova Dept. of Molecular and Cell Biology.
New Tools for Visualizing Genome Evolution Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island J. Peter Gogarten Dept. of Molecular.
Molecular Evolution Revised 29/12/06
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Sequence alignment: Removing ambiguous positions: Generation of pseudosamples: Calculating and evaluating phylogenies: Comparing phylogenies: Comparing.
Sequence and structure databanks can be divided into many different categories. One of the most important is Supervised databanks with gatekeeper. Examples:
MCB 5472 Gene Families, Super Trees and Super Matrices Peter Gogarten Office: BSP 404 phone: ,
Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
Description of Group B Streptococcus Pan-genome Genome comparisons of 8 closely related GBS strains Tettelin, Fraser et al., PNAS 2005 Sep 27;102(39)
The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan.
BME 130 – Genomes Lecture 26 Molecular phylogenies I.
Steps of the phylogenetic analysis
Branches, splits, bipartitions In a rooted tree: clades (for urooted trees sometimes the term clann is used) Mono-, Para-, polyphyletic groups, cladists.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
Example of bipartition analysis for five genomes of photosynthetic bacteria (188 gene families) total 10 bipartitions R: Rhodobacter capsulatus, H: Heliobacillus.
Trees – what might they mean? Calculating a tree is comparatively easy, figuring out what it might mean is much more difficult. If this is the probable.
Cenancestor (aka LUCA or MRCA) can be placed using the echo remaining from the early expansion of the genetic code. reflects only a single cellular component.
Trees? J. Peter Gogarten University of Connecticut Dept. of Molecular and Cell Biology Sculpture at Royal Botanical Gardens, Kew.
MCB 372 #14: Student Presentations, Discussion, Clustering Genes Based on Phylogenetic Information J. Peter Gogarten University of Connecticut Dept. of.
Processing & Testing Phylogenetic Trees. Rooting.
Maximum parsimony Kai Müller.
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Coalescence and the Cenancestor J. Peter Gogarten University of Connecticut Department of Molecular and Cell Biology.
Pollen transcript unigene identifier log 2 -fold change Annotation (BLAST) Unigene L. longiflorum chloroplast, complete genome Unigene
SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.
MCB5472 Computer methods in molecular evolution Lecture 4/21/2014.
Dichotomy of major bacterial phyla inferred from gene arrangement comparisons Takashi Kunisawa Science University of Tokyo Noda , Japan CODATA06.
ATPase dataset -> nj in figtree. ATPase dataset -> muscle -> phyml (with ASRV)– re-rooted.
MCB 3421 class 25. student evaluations Please follow this link to the on-line surveys that are open for you this semester.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Lecture 2: Principles of Phylogenetics
The bootstrap, consenus-trees, and super-trees Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetic analyses of cyanobacterial genomes: Quantification of horizontal gene transfer events Olga Zhaxybayeva, J. Peter Gogarten, Robert L. Charlebois,
Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylogeny and Genome Biology Andrew Jackson Wellcome Trust Sanger Institute Changes: Type program name to start Always Cd to phyml directory before starting.
ATPase dataset from last Friday Alignment clustal vs muscle Conserved part are aligned reproducibly.
ATPase dataset from last Friday Alignment clustal vs muscle Conserved part are aligned reproducibly.
MCB5472 Computer methods in molecular evolution Slides for comp lab 4/2/2014.
MCB 3421 class 26.
Understanding sets of trees CS 394C September 10, 2009.
Bayes’ Theorem Reverend Thomas Bayes ( ) Posterior Probability represents the degree to which we believe a given model accurately describes the.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Bootstrap ? See herehere. Maximum Likelihood and Model Choice The maximum Likelihood Ratio Test (LRT) allows to compare two nested models given a dataset.Likelihood.
SupreFine, a new supertree method Shel Swenson September 17th 2009.
The Big Issues in Phylogenetic Reconstruction Randy Linder Integrative Biology, University of Texas
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Darwin’s Tree of Life, July million species Phylogenetic inference from genomic.
Phylogenetic reconstruction - How Distance analyses calculate pairwise distances (different distance measures, correction for multiple hits, correction.
Phylogenetic genome analysis, phylogenomics
MCB 3421 class 26.
Welch RA, et al. Proc Natl Acad Sci U S A. 2002; 99:
Methods of molecular phylogeny
Patterns in Evolution I. Phylogenetic
Why could a gene tree be different from the species tree?
Comments on bipartitions, quartets and supertrees
MCB 5472 Intro to Trees Peter Gogarten Office: BSP 404
Chapter 19 Molecular Phylogenetics
Reverend Thomas Bayes ( )
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

MCB 372 #12: Tree, Quartets and Supermatrix Approaches Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre (UConn) Greg Fournier (UConn) Funded through the NASA Exobiology and AISR Programs, and NSF Microbial Genetics Edvard Munch, The Dance of Life (1900) J. Peter Gogarten University of Connecticut Dept. of Molecular and Cell Biology

In the Felsenstein Zone “long branches attract” C B D A AB CD “true” tree inferred tree A B C D

Protpars reconstructions B A C D D A C B

ML reconstructions with alignment step

long branch attraction artifact 100% bootstrap support for bipartition (AD)(CB) the two longest branches join together What could you do to investigate if this is a possible explanation? use only slow positions, use an algorithm that corrects for ASRV seq. from B seq. from A seq. from C seq. from D seq. from B seq. from A seq. from C seq. from D True Tree:

Consensus of all trees from all bootstrap samples 2 P_mobilis 3 Thermosipho 1 F_nodosum 4 T_lettingae 5 T_maritima 7 T_RQ2 6 T_petrophila P_mobilis | | T_petrophila | | | | T_RQ2 | | | | | T_maritima | | | | T_lettingae | | Thermosipho | F_nodosum

Consensus of all consensus trees 2 P_mobilis 3 Thermosipho 1 F_nodosum 4 T_lettingae 5 T_maritima 7 T_RQ2 6 T_petrophila T_petrophila | | T_RQ2 | | | T_maritima | | | T_lettingae | | | | Thermosipho | | | F_nodosum | P_mobilis

Consensus of all collapsed (<95%) consensus trees P_mobilis | | T_petrophila | | | | T_RQ2 | | | | | T_maritima | | | | T_lettingae | | Thermosipho | F_nodosum If you still have difficulties with tree do the tree tests at

METAGENOME Welch et al, 2002 A.W.F. Edwards 1998 Edwards-Venn cogwheel core Strain-specific Pan-genome + + +

Genomic Islands Binnewies, Motro et al., Funct. Integr. Genomics (2006) 6: 165–185

Gene frequency in a typical genome -Pick a random gene from any of the 293 genomes -Search in how many genomes this gene is present -Sampling of 15,000 genes F(x) = sum [ A n *exp (K n *x) ] (Character genes)(Accessory pool)(Extended Core)

Kézdy-Swinbourne Plot If f(x)=K+A exp(-kx), then f(x+∆x)=K+A exp(-k(x+∆x)). Through elimination of A: f(x+∆x)=exp(-k ∆x) f(x) + K’ And for x , f(x)  K, f(x+∆x)  K Novel genes after looking in x genomes Novel genes after looking in x + ∆x genomes only values with x ≥ 80 genomes were included Even after comparing to a very large (infinite) number of bacterial genomes, on average, each new genome will contain about 230 genes that do not have a homolog in the other genomes. ~230 novel genes per genome

ca.70% ca. 5% ca. 25% Gene frequency in individual genomes Extended Core Character Genes Accessory Pool (mainly genes acquired form the mobilome) Approximate number of genes sampled in 200 bacterial genomes: 25,160 core genes 453,781 extended core genes 156,259 accessory genes

The Phylogenetic Position of Thermotoga (a) concordant genes, (b) all genes & according to 16S (c) according to phylogenetically discordant genes Gophna, U., Doolittle, W.F. & Charlebois, R.L.: Weighted genome trees: refinements and applications. J. Bacteriol. (2005)

From: Delsuc F, Brinkmann H, Philippe H. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet May;6(5):

Supertree vs. Supermatrix Schematic of MRP supertree (left) and parsimony supermatrix (right) approaches to the analysis of three data sets. Clade C+D is supported by all three separate data sets, but not by the supermatrix. Synapomorphies for clade C+D are highlighted in pink. Clade A+B+C is not supported by separate analyses of the three data sets, but is supported by the supermatrix. Synapomorphies for clade A+B+C are highlighted in blue. E is the outgroup used to root the tree. From: Alan de Queiroz John Gatesy: The supermatrix approach to systematics Trends Ecol Evol Jan;22(1):34-41

A) Template tree B) Generate 100 datasets using Evolver with certain amount of HGTs C) Calculate 1 tree using the concatenated dataset or 100 individual trees D) Calculate Quartet based tree using Quartet Suite Repeated 100 times…

Supermatrix versus Quartet based Supertree inset: simulated phylogeny