Presentation is loading. Please wait.

Presentation is loading. Please wait.

MCB 372 #12: Tree, Quartets and Supermatrix Approaches Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre.

Similar presentations


Presentation on theme: "MCB 372 #12: Tree, Quartets and Supermatrix Approaches Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre."— Presentation transcript:

1 MCB 372 #12: Tree, Quartets and Supermatrix Approaches Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre (UConn) Greg Fournier (UConn) Funded through the NASA Exobiology and AISR Programs, and NSF Microbial Genetics Edvard Munch, The Dance of Life (1900) J. Peter Gogarten University of Connecticut Dept. of Molecular and Cell Biology

2 In the Felsenstein Zone “long branches attract” 0.8 0.1 0.8 0.1 C B D A AB CD “true” tree inferred tree A B C D

3 Protpars reconstructions B A C D D A C B

4 ML reconstructions with alignment step

5 long branch attraction artifact 100% bootstrap support for bipartition (AD)(CB) the two longest branches join together What could you do to investigate if this is a possible explanation? use only slow positions, use an algorithm that corrects for ASRV seq. from B seq. from A seq. from C seq. from D seq. from B seq. from A seq. from C seq. from D True Tree:

6 Consensus of all trees from all bootstrap samples 2 P_mobilis 3 Thermosipho 1 F_nodosum 4 T_lettingae 5 T_maritima 7 T_RQ2 6 T_petrophila +----------------------------------2 P_mobilis | | +------6 T_petrophila | +134.0-| | +675.0-| +------7 T_RQ2 | | | | +-60.0-| +-------------5 T_maritima | | | +------| +--------------------4 T_lettingae | | +------3 Thermosipho +--------------206.0-| +------1 F_nodosum

7 Consensus of all consensus trees 2 P_mobilis 3 Thermosipho 1 F_nodosum 4 T_lettingae 5 T_maritima 7 T_RQ2 6 T_petrophila +------6 T_petrophila +318.0-| +745.0-| +------7T_RQ2 | | +330.0-| +-------------5T_maritima | | +------| +--------------------4 T_lettingae | | | | +------3 Thermosipho | +--------------562.0-| | +------1 F_nodosum | +----------------------------------2 P_mobilis

8 Consensus of all collapsed (<95%) consensus trees +----------------------------------2 P_mobilis | | +------6 T_petrophila | +134.0-| | +675.0-| +------7 T_RQ2 | | | | +-60.0-| +-------------5 T_maritima | | | +------| +--------------------4 T_lettingae | | +------3 Thermosipho +--------------206.0-| +------1 F_nodosum If you still have difficulties with tree do the tree tests at http://www.sciencemag.org/cgi/content/full/310/5750/979/DC1

9 METAGENOME Welch et al, 2002 A.W.F. Edwards 1998 Edwards-Venn cogwheel core Strain-specific Pan-genome + + +

10 Genomic Islands Binnewies, Motro et al., Funct. Integr. Genomics (2006) 6: 165–185

11 Gene frequency in a typical genome -Pick a random gene from any of the 293 genomes -Search in how many genomes this gene is present -Sampling of 15,000 genes F(x) = sum [ A n *exp (K n *x) ] (Character genes)(Accessory pool)(Extended Core)

12 Kézdy-Swinbourne Plot If f(x)=K+A exp(-kx), then f(x+∆x)=K+A exp(-k(x+∆x)). Through elimination of A: f(x+∆x)=exp(-k ∆x) f(x) + K’ And for x , f(x)  K, f(x+∆x)  K Novel genes after looking in x genomes Novel genes after looking in x + ∆x genomes only values with x ≥ 80 genomes were included Even after comparing to a very large (infinite) number of bacterial genomes, on average, each new genome will contain about 230 genes that do not have a homolog in the other genomes. ~230 novel genes per genome

13 ca.70% ca. 5% ca. 25% Gene frequency in individual genomes Extended Core Character Genes Accessory Pool (mainly genes acquired form the mobilome) Approximate number of genes sampled in 200 bacterial genomes: 25,160 core genes 453,781 extended core genes 156,259 accessory genes

14 The Phylogenetic Position of Thermotoga (a) concordant genes, (b) all genes & according to 16S (c) according to phylogenetically discordant genes Gophna, U., Doolittle, W.F. & Charlebois, R.L.: Weighted genome trees: refinements and applications. J. Bacteriol. (2005)

15 From: Delsuc F, Brinkmann H, Philippe H. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 2005 May;6(5):361-75.

16 Supertree vs. Supermatrix Schematic of MRP supertree (left) and parsimony supermatrix (right) approaches to the analysis of three data sets. Clade C+D is supported by all three separate data sets, but not by the supermatrix. Synapomorphies for clade C+D are highlighted in pink. Clade A+B+C is not supported by separate analyses of the three data sets, but is supported by the supermatrix. Synapomorphies for clade A+B+C are highlighted in blue. E is the outgroup used to root the tree. From: Alan de Queiroz John Gatesy: The supermatrix approach to systematics Trends Ecol Evol. 2007 Jan;22(1):34-41

17 A) Template tree B) Generate 100 datasets using Evolver with certain amount of HGTs C) Calculate 1 tree using the concatenated dataset or 100 individual trees D) Calculate Quartet based tree using Quartet Suite Repeated 100 times…

18 Supermatrix versus Quartet based Supertree inset: simulated phylogeny


Download ppt "MCB 372 #12: Tree, Quartets and Supermatrix Approaches Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre."

Similar presentations


Ads by Google