MCB 3421 class 25. student evaluations Please follow this link to the on-line surveys that are open for you this semester.

Slides:



Advertisements
Similar presentations
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Evolutionary Analysis. Tree Mathematical structure Model evolutionary history.
MCB 5472 Supertrees vs Supermatrix Assembly of Gene Families Peter Gogarten Office: BSP 404 phone: ,
Maria Poptsova University of Connecticut Dept. of Molecular and Cell Biology August 18, 2006, Stanford University, CA AUTOMATED ASSEMBLY OF GENE FAMILIES.
An Introduction to Phylogenetic Methods
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Types of homology BLAST
A Web Interface to analyse SOM of Bipartitions of Gene Phylogenies - A Walk Through J. Peter Gogarten, Maria Poptsova Dept. of Molecular and Cell Biology.
New Tools for Visualizing Genome Evolution Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island J. Peter Gogarten Dept. of Molecular.
Molecular Evolution Revised 29/12/06
© Wiley Publishing All Rights Reserved. Phylogeny.
The Cobweb of life revealed by Genome-Scale estimates of Horizontal Gene Transfer Fan Ge, Li-San Wang, Junhyong Kim Mourya Vardhan.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Sequence alignment: Removing ambiguous positions: Generation of pseudosamples: Calculating and evaluating phylogenies: Comparing phylogenies: Comparing.
The gradualist point of view Evolution occurs within populations where the fittest organisms have a selective advantage. Over time the advantages genes.
Sequence alignment: Removing ambiguous positions: Generation of pseudosamples: Calculating and evaluating phylogenies: Comparing phylogenies: Comparing.
MCB 5472 Assembly of Gene Families Peter Gogarten Office: BSP 404 phone: ,
The gradualist point of view Evolution occurs within populations where the fittest organisms have a selective advantage. Over time the advantages genes.
Bioinformatics and Phylogenetic Analysis
MCB 5472 Gene Families, Super Trees and Super Matrices Peter Gogarten Office: BSP 404 phone: ,
Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
Steps of the phylogenetic analysis
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
Example of bipartition analysis for five genomes of photosynthetic bacteria (188 gene families) total 10 bipartitions R: Rhodobacter capsulatus, H: Heliobacillus.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
MCB 372 #12: Tree, Quartets and Supermatrix Approaches Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
MCB 372 #14: Student Presentations, Discussion, Clustering Genes Based on Phylogenetic Information J. Peter Gogarten University of Connecticut Dept. of.
Bioinformatics tools for phylogeny and visualization
MCB5472 Computer methods in molecular evolution Lecture 3/22/2014.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Molecular phylogenetics
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Coalescence and the Cenancestor J. Peter Gogarten University of Connecticut Department of Molecular and Cell Biology.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.
MCB5472 Computer methods in molecular evolution Lecture 4/21/2014.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Neutral theory: The vast majority of observed sequence differences between members of a population are neutral (or close to neutral). These differences.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Chapter 8 Molecular Phylogenetics: Measuring Evolution.
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
The bootstrap, consenus-trees, and super-trees Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetic analyses of cyanobacterial genomes: Quantification of horizontal gene transfer events Olga Zhaxybayeva, J. Peter Gogarten, Robert L. Charlebois,
Phylogeny and Genome Biology Andrew Jackson Wellcome Trust Sanger Institute Changes: Type program name to start Always Cd to phyml directory before starting.
Using blast to study gene evolution – an example.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
MCB5472 Computer methods in molecular evolution Slides for comp lab 4/2/2014.
MCB 3421 class 26.
Phylogeny & Systematics
Bayes’ Theorem Reverend Thomas Bayes ( ) Posterior Probability represents the degree to which we believe a given model accurately describes the.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
MCB 3421 class 25. student evaluations Please go to husky CT and complete student evaluations !
The gradualist point of view Evolution occurs within populations where the fittest organisms have a selective advantage. Over time the advantages genes.
Darwin’s Tree of Life, July million species Phylogenetic inference from genomic.
Phylogenetic genome analysis, phylogenomics
MCB 3421 class 26.
Phylogeny - based on whole genome data
Comments on bipartitions, quartets and supertrees
Chapter 19 Molecular Phylogenetics
Gautam Dey, Tobias Meyer  Cell Systems 
Presentation transcript:

MCB 3421 class 25

student evaluations Please follow this link to the on-line surveys that are open for you this semester.

the gradualist point of view Evolution occurs within populations where the fittest organisms have a selective advantage. Over time the advantages genes become fixed in a population and the population gradually changes. See Wikipedia on the modern synthesis Processes that MIGHT go beyond inheritance with variation and selection? Horizontal gene transfer and recombination Polyploidization (botany, vertebrate evolution) see here or herehere Fusion and cooperation of organisms (Kefir, lichen, also the eukaryotic cell) Targeted mutations (?), genetic memory (?) (see Foster's and Hall's reviews on directed/adaptive mutations; see here for a counterpoint)Foster'sHall'shere Random genetic drift Mutationism Gratuitous complexity Selfish genes (who/what is the subject of evolution??) Evolutionary capacitors Hopeless monsters (in analogy to Goldschmidt’s hopeful monsters)Hopeless monstershopeful monsters

Other ways to detect positive selection Selective sweeps -> fewer alleles present in population (see contributions from archaic Humans for example) Repeated episodes of positive selection -> high dN

ori

Finding transferred genes Screening in the wet-lab and in the computer

Finding transferred genes

Taxplot at NCBI

Other approaches to find transferred genes Gene presence absence data for closely related genomes (for additional genes) Phylogenetic conflict (for homologous replacement (e.g. quartet decompositon spectra see Figs. 1 and 2 ) quartet decompositon spectra Composition based analyses (for very recent transfers).

Phylogenetic information present in genomes Break information into small quanta of information (bipartitions or embedded quartets) Decomposition of Phylogenetic Data Analyze spectra to detect transferred genes and plurality consensus.

BIPARTITION OF A PHYLOGENETIC TREE Bipartition (or split) – a division of a phylogenetic tree into two parts that are connected by a single branch. It divides a dataset into two groups, but it does not consider the relationships within each of the two groups. 95 compatible to illustrated bipartition incompatible to illustrated bipartition * * *..... Orange vs Rest.. *.... * Yellow vs Rest * * *... * *

“Lento”-plot of 34 supported bipartitions (out of 4082 possible) 13 gamma- proteobacterial genomes (258 putative orthologs): E.coli Buchnera Haemophilus Pasteurella Salmonella Yersinia pestis (2 strains) Vibrio Xanthomonas (2 sp.) Pseudomonas Wigglesworthia There are 13,749,310,575 possible unrooted tree topologies for 13 genomes

10 cyanobacteria: Anabaena Trichodesmium Synechocystis sp. Prochlorococcus marinus (3 strains) Marine Synechococcus Thermo- synechococcus elongatus Gloeobacter Nostoc punctioforme “Lento”-plot of supported bipartitions (out of 501 possible) Zhaxybayeva, Lapierre and Gogarten, Trends in Genetics, 2004, 20(5): Based on 678 sets of orthologous genes Number of datasets

N=4(0) N=5(1) N=8(4) N=13(9)N=23(19)N=53(49) 0.01 A A B A A A A B B B B B B DC D C D C D C D C D C From: Mao F, Williams D, Zhaxybayeva O, Poptsova M, Lapierre P, Gogarten JP, Xu Y (2012) BMC Bioinformatics 13:123, doi: /

Methodology : Input tree Seq-Gen Aligned Simulated AA Sequences (200,500 and 1000 AA) WAG, Cat=4 Alpha=1 Seqboot 100 Bootstraps ML Tree Calculation FastTree, WAG, Cat=4 Consense Extract Bipartitions For each individual trees Extract Highest Bootstrap support separating AB><CD Count How many trees embedded quartet AB><CD is supported Repeat 100 times

Results : Maximum Bootstrap Support value for Bipartition separating (AB) and (CD) Maximum Bootstrap Support value for embedded Quartet (AB),(CD)

Bootstrap support values for embedded quartets + : tree calculated from one pseudo- sample generated by bootstraping from an alignment of one gene family present in 11 genomes Quartet spectral analyses of genomes iterates over three loops:  Repeat for all bootstrap samples.  Repeat for all possible embedded quartets.  Repeat for all gene families. : embedded quartet for genomes 1, 4, 9, and 10. This bootstrap sample supports the topology ((1,4),9,10)  Zhaxybayeva et al. 2006, Genome Research, 16(9):

Total number of gene families containing the species quartet Number of gene families supporting the same topology as the plurality (colored according to bootstrap support level) Number of gene families supporting one of the two alternative quartet topologies Illustration of one component of a quartet spectral analyses Summary of phylogenetic information for one genome quartet for all gene families

Quartet decomposition analysis of 19 Prochlorococcus and marine Synechococcus genomes. Quartets with a very short internal branch or very long external branches as well those resolved by less than 30% of gene families were excluded from the analyses to minimize artifacts of phylogenetic reconstruction.

Plurality consensus calculated as supertree (MRP) from quartets in the plurality topology.

Plurality neighbor-net calculated as supertree (from the MRP matrix using SplitsTree 4.0) from all quartets significantly supported by all individual gene families (1812) without in-paralogs. NeighborNet (calculated with SplitsTree 4.0)

From: Delsuc F, Brinkmann H, Philippe H. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet May;6(5):

Supertree vs. Supermatrix Schematic of MRP supertree (left) and parsimony supermatrix (right) approaches to the analysis of three data sets. Clade C+D is supported by all three separate data sets, but not by the supermatrix. Synapomorphies for clade C+D are highlighted in pink. Clade A+B+C is not supported by separate analyses of the three data sets, but is supported by the supermatrix. Synapomorphies for clade A+B+C are highlighted in blue. E is the outgroup used to root the tree. From: Alan de Queiroz John Gatesy: The supermatrix approach to systematics Trends Ecol Evol Jan;22(1):34-41

A) Template tree B) Generate 100 datasets using Evolver with certain amount of HGTs C) Calculate 1 tree using the concatenated dataset or 100 individual trees D) Calculate Quartet based tree using Quartet Suite Repeated 100 times…

Supermatrix versus Quartet based Supertree inset: simulated phylogeny

Note : Using same genome seed random number will reproduce same genome history From: Lapierre P, Lasek-Nesselquist E, and Gogarten JP (2012) The impact of HGT on phylogenomic reconstruction methods Brief Bioinform [first published online August 20, 2012] doi: /bib/bbs050 doi: /bib/bbs050

HGT EvolSimulator Results

See for more information. What is the bottom line?

Odysseus vor Scilla und Charybdis Johann Heinrich Füssli From: e:Johann_Heinrich_F%C3%BCssl i_054.jpg

Examples B1 is an ortholog to C1 and to A1 C2 is a paralog to C3 and to B1; BUT A1 is an ortholog to both B1, B2,and to C1, C2, and C3 From: Walter Fitch (2000): Homology: a personal view on some of the problems, TIG 16 (5)

Types of Paralogs: In- and Outparalogs …. all genes in the HA* set are co- orthologous to all genes in the WA* set. The genes HA* are hence ‘inparalogs’ to each other when comparing human to worm. By contrast, the genes HB and HA* are ‘outparalogs’ when comparing human with worm. However, HB and HA*, and WB and WA* are inparalogs when comparing with yeast, because the animal–yeast split pre- dates the HA*–HB duplication. From: Sonnhammer and Koonin: Orthology, paralogy and proposed classification for paralog TIG 18 (12) 2002,

Selection of Orthologous Gene Families (COG, or Cluster of Orthologous Groups) All automated methods for assembling sets of orthologous genes are based on sequence similarities. BLAST hits (SCOP database) Triangular circular BLAST significant hits Sequence identity of 30% and greater Similarity complemented by HMM-profile analysis Pfam database Reciprocal BLAST hit method

’2’ often fails in the presence of paralogs 1 gene family Strict Reciprocal BLAST Hit Method 0 gene family

Families of ATP-synthases ATP-A ATP-F ATP-B Escherichia coli Bacillus subtilis Escherichia coli Methanosarcina mazei Methanosarcina mazei Sulfolobus solfataricus Sulfolobus solfataricus Family of ATP-A Family of ATP-B Family of ATP-F Phylogenetic Tree

BranchClust Algorithm genome i genome 1 genome 2 genome 3 genome N dataset of N genomes superfamily tree BLAST hits

BranchClust Algorithm

BranchClust Algorithm Data Flow Download n complete genomes (ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria) In fasta format (*.faa) Put all n genomes in one database Search all ORF against database, consisting of n genomes Parse BLAST-output with the requirement that all members of a superfamily should have an E-value better than a cut-off Superfamilies Align with ClustalW Reconstruct superfamily tree ClustalW –quick distance method Phyml – Maximum Likelihood Parse with BranchClust Gene families

BranchClust Algorithm Implementation and Usage 1.Bioperl module for parsing trees Bio::TreeIO 2. Taxa recognition file gi_numbers.out must be present in the current directory. For information on how to create this file, read the Taxa recognition file section on the web-site. 3. Blastall from NCB needs to be installed. The BranchClust algorithm is implemented in Perl with the use of the BioPerl module for parsing trees and is freely available at Required: