National Center for Biotechnology Information Evolution of eukaryotic genomes: remarkable conservation and massive loss of genes and introns Eugene V.

Slides:



Advertisements
Similar presentations
Genetica per Scienze Naturali a.a prof S. Presciuttini Homologous genes Genes with similar functions can be found in a diverse range of living things.
Advertisements

Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National.
Modular proteins I Level 3 Molecular Evolution and Bioinformatics Jim Provan Patthy Sections –
Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
Shiri Freilich Janet Thornton’s group, EBI Cambridge University Relating the evolution of gene content to tissue specialization.
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Basics of Comparative Genomics Dr G. P. S. Raghava.
History, protohistory and prehistory of the Arabidopsis thaliana chromosome complement Henry Yves et al 2006, in press.
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
Molecular Clock I. Evolutionary rate Xuhua Xia
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Bioinformatics and Phylogenetic Analysis
The origins & evolution of genome complexity Seth Donoughe Lynch & Conery (2003)
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
Alternative splicing and evolution Daniel Jeffares.
Network topology and evolution of hard to gain and hard to loose attributes Teresa Przytycka NIH / NLM / NCBI.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
Scientific FieldsScientific Fields  Different fields of science have contributed evidence for the theory of evolution  Anatomy  Embryology  Biochemistry.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Comparative Genomics of the Eukaryotes
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the.
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage.
Introduction to Phylogenetics
Endogenous Retroviral promoter of the Human gene Kim Tae Hyung Oct 02,2004 MPL.
Calculating branch lengths from distances. ABC A B C----- a b c.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
26.1 Organisms Evolve Through Genetic Change Occurring Within Populations. “Nothing in Biology makes sense except in the light of Evolution” –Theodosius.
Genomic and comparative genomic analysis BIO520 BioinformaticsJim Lund.
Comparative genomics Haixu Tang School of Informatics.
Introduction to History of Life. Biological evolution consists of change in the hereditary characteristics of groups of organisms over the course of generations.
Using blast to study gene evolution – an example.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Biol729 – The kinomes of model organisms. Phylogenetic comparison of the human kinome with those of yeast ( S. cerevisiae), worm (C. elegans) and fly.
Classification.
Phylogeny & Systematics
Chapter 3 The Interrupted Gene.
Classification and Phylogenetic Relationships
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
How many genes are there?
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used for.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
Section 2: Modern Systematics
Phylogeny and the Tree of Life
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used.
Evolutionary genomics can now be applied beyond ‘model’ organisms
Genetics and Evolutionary Biology
Basics of Comparative Genomics
Section 2: Modern Systematics
Evolution of eukaryote genomes
Chapter 25 Phylogeny and the Tree of Life
Teresa Przytycka NIH / NLM / NCBI
Chapter 19 Molecular Phylogenetics
Gautam Dey, Tobias Meyer  Cell Systems 
Unit Genomic sequencing
Basics of Comparative Genomics
Fig. 2. —Phylogenetic relationships and motif compositions of some representative MORC genes in plants and animals. ... Fig. 2. —Phylogenetic relationships.
Presentation transcript:

National Center for Biotechnology Information Evolution of eukaryotic genomes: remarkable conservation and massive loss of genes and introns Eugene V. Koonin National Center for Biotechnology Information, NIH, Bethesda, MD

National Center for Biotechnology Information “ In my own subjects, genetics and molecular biology, research has become so directed toward medical problems and the needs of the pharmaceutical companies that most people do not recognize that the most challenging intellectual problem of all time, the reconstruction of our biological past, can now be tackled with some hope of success. “ Sydney Brenner, Science 282, (20 Nov 1998)

National Center for Biotechnology Information Comprehensive evolutionary classification of genes from sequenced genomes

National Center for Biotechnology Information Ancient conserved eukaryotic genes

National Center for Biotechnology Information Current status of evolutionary classification of proteins from 7 complete eukaryotic genomes: proteins = in KOGs in LSEs singletons Lineage-specific expansions Tatusov et al., BMC Bionformatics, 2003 Sep 11;4(1):41.

National Center for Biotechnology Information Breakdown of eukaryotic proteins into KOGs, LSEs and singletons Current status of evolutionary classification of proteins from 7 complete genomes

National Center for Biotechnology Information Define a phyletic pattern

National Center for Biotechnology Information

All All-Ec Animals-Fungi Plant+fungi Plant+animals All animals All fungi Other patterns Phyletic patterns of eukaryotic KOGs

National Center for Biotechnology Information S. cerevisiae % 25% 50% 75% 100% non-essential Phyletic patterns of KOGs and phenotypic effect of knockouts Essential genes tend not to be lost during evolution

National Center for Biotechnology Information C. elegans % 25% 50% 75% 100% non-essential Phyletic patterns of KOGs and phenotypic effect of knockouts Essential genes tend not to be lost during evolution

National Center for Biotechnology Information The traditional application of the evolutionary parsimony principle: Given the distribution of a set of binary characters in a set of species, construct the shortest tree (maximum parsimony tree) A B C D A D B C

National Center for Biotechnology Information However, parsimony can be used with equal ease to address the reverse task: given the distribution of a set of binary characters in a set of species AND the *true* tree topology, construct the most parsimonious scenario of evolution (which, of course, might include many more events than the overall most economical scenario) A B C D ABCD

National Center for Biotechnology Information Ec Sc Sp Ce Dm Hs AtAt 100% Maximum parsimony (Dollo) tree for eukaryotes based on the phyletic patterns of KOGs

National Center for Biotechnology Information The phylogenetic parsimony tree built on the basis of KOG phyletic patterns did not follow the species tree However, the parsimony principle can be applied in the opposite direction: given a species tree topology, construct the most parsimonious scenario for the evolution of eukaryotic gene repertoire (mapping of gene (KOG) gain and loss events on the tree branches): 1/0 0/1 gain loss

National Center for Biotechnology Information Dm Hs Ce Sc Sp At Ec The most parsimonious scenario of gene loss and birth in eukaryotic evolution and ancestral gene sets Gene gain Gene loss Koonin et al Genome Biol. 5: R7.

National Center for Biotechnology Information Exon/intron structure of eukaryotic genes Eukaryotic nuclear, protein-coding genes usually contain multiple spliceosomal introns that are spliced out of pre-mRNAs by an RNA-protein complex, the spliceosome. GUAG exon1 exon2 intron

National Center for Biotechnology Information Evolution of introns and the exonic structure of eukaryotic genes Tempo and mode of intron evolution remain poorly understood. When did introns invade eukaryotic genes: prior to the origin of eukaryotes (introns early), early in eukaryotic evolution, or late? The common ancestor of animals, plants and fungi: intron-rich or intron-poor? What fraction of introns is conserved over long evolutionary spans?

National Center for Biotechnology Information Origin of introns The "intron-early" hypothesis suggests that introns existed before the divergence of prokaryotes and eukaryotes (W. Gilbert). The "intron-late" hypothesis posits that introns were inserted into eukaryotic genes after this divergence (T.Cavalier-Smith, Doolittles, J.Palmer) Loss and sliding Gain and loss

National Center for Biotechnology Information Three mechanisms of intron evolution have been invoked by proponents of both theories: - intron loss - intron gain - intron sliding Mechanisms of intron evolution

National Center for Biotechnology Information Mechanisms of intron evolution: intron loss intron loss Complete loss of introns: re-integration of reverse-transcribed mRNAs into the genome Loss of one or few introns (recombination/gene conversion between cDNAs and genomic sequences (Feiber et al ))

National Center for Biotechnology Information Mechanisms of intron evolution: intron gain intron gain ? A common event

National Center for Biotechnology Information Mechanisms of intron evolution Why is our understanding of intron evolution so limited? - Lack of information on exon/intron structure of orthologous genes Can we use completely sequenced genomes? - This is a great source of information but … they are not necessarily easy to work with...

National Center for Biotechnology Information Analysis of introns in completely sequenced genomes We used sets of orthologous genes which contained a member from each of 8 eukaryotic genomes: Human (HS) Fly (DM) Mosquito (AG) Worm (CE) Plant (Arabidopsis) (AT) Baker’s yeast (SC) Fission yeast (SP) Malaria Plasmodium (PF) KOG database

National Center for Biotechnology Information KOG analysis (8 species) Multiple alignment (MAP) Identification of conserved blocks Projection of introns on alignment Extraction of intron positions from genomes Pipeline for analysis of evolution of intron-exon structure

National Center for Biotechnology Information HS …ATGTCGATCGTGCTCGTCGTACTCTCGTAC… DM …ATGTGGATCGTGCTCGTCGTACTCTCGTAC… CE …ATGTGGATTGTGCTCGTCGTACTCTCGTAC… AT …ATGTTGATGGTGCTCGTCGTACTCTCGTAC… SC …ATGTTGATTGTGCTCGTCGTACTCTCGTAC… SP …ATGTTGATT---CTCGTCGTACTCTCGTAC… All positions with gaps were deleted to ensure robustness of the analysis… but we also analyzed the complete alignments Conserved introns (found in two or more species) Non-conserved introns (one species only)

National Center for Biotechnology Information Statistical analysis: shuffling of intron positions, Monte Carlo simulation HS …ATGTCGATCGTGCTCGTCGTACTCTCGTAC… DM …ATGTGGATCGTGCTCGTCGTACTCTCGTAC… CE …ATGTGGATTGTGCTCGTCGTACTCTCGTAC… AT …ATGTTGATGGTGCTCGTCGTACTCTCGTAC… SC …ATGTTGATTGTGCTCGTCGTACTCTCGTAC… SP …ATGTTGATTGTCCTCGTCGTACTCTCGTAC…

National Center for Biotechnology Information CONSERVATION OF INTRON POSITIONS IN 8 EUKARYOTIC SPECIES

National Center for Biotechnology Information /6930 Conservation of intron positions among eukaryotes Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV. Curr Biol Sep 2;13(17):

National Center for Biotechnology Information Example: KOG0473 – ribosomal protein L37 Alignment with mapped intron positions is converted to a matrix of intron presence/absence

National Center for Biotechnology Information Conserved intron positions - phylogenetic signal Example: KOG TCP-1a subunit of chaperonin complex The only intron among 684 genes conserved in 7 species Matrices for all analyzed genes were concatenated and employed to build a single tree KOGs, 7236 intron positions

National Center for Biotechnology Information Phylogenetic tree of crown group eukaryotes based on conservation of intron positions: parsimony The topology of this tree is a bit unexpected...

National Center for Biotechnology Information The phylogenetic parsimony tree built on the basis of the pattern of intron conservation did not follow the species tree. However, the parsimony principle can be applied in the opposite direction: given a species tree topology, construct the most parsimonious scenario for the evolution of eukaryotic gene structure: distribution of intron gain and loss events over the tree branches 1/0 0/1 gain loss

National Center for Biotechnology Information Parsimonious evolutionary scenario for the most realistic topology of the eukaryotic tree Dm Ag Hs Ce Sc Sp At Pf Intron loss Intron gain Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV. Curr Biol Sep 2;13(17):

National Center for Biotechnology Information Roy SW, Fedorov A, Gilbert W. Large-scale comparison of intron positions in mammalian genes shows intron loss but no gain. Proc Natl Acad Sci U S A Jun 10;100(12): A. S. Kondrashov, personal communication There seems to have been virtually no intron gain and limited intron loss during mammalian evolution Humanmouserat ~100 introns lost ~0 introns gained ~100 Mya Fish

National Center for Biotechnology Information A conundrum of intron evolution: practically no intron gain during (at least) ~100 mln yrs of mammalian evolution apparent massive gain during evolution of animal phyla (e.g., chordates) ~ mln yr scale Are major transitions in eukaryotic evolution associated with bursts of intron insertion?

National Center for Biotechnology Information Koonin, 2004, Cell Cycle 3, 280

National Center for Biotechnology Information Gain/loss of genes and gain/loss of introns in conserved genes occur in parallel in eukaryotic evolution – probably manifestation of the same, general lineage-specific trends ‘…by magnifying the power of random genetic drift, reduced population size provides a permissive environment for the proliferation of various genomic features that would otherwise be eliminated by purifying selection.’ Lynch, M., Conery, J.S. (2003) The Origins of Genome Complexity. Science 302,

National Center for Biotechnology Information Comparing old and new introns: gaining insight into the origin of introns Sverdlov, Babenko, Rogozin, Koonin. Curr. Biol. (2003); Gene (2004, in press)

National Center for Biotechnology Information Distribution of old and new introns along the gene length All genomes pooled

National Center for Biotechnology Information Distribution of old and new introns along the gene length S. pombe – an intron-poor genome – nearly identical distributions of old and new introns

National Center for Biotechnology Information Distribution of old and new introns along the gene length H. sapiens – an intron-rich genome – enrichment for new introns in the 3’-region

National Center for Biotechnology Information Reverse transcription duplication TTTTTTT T GT AG AAAAAAAAA5’3’ Genomic DNA Homologous recombination new intron GT AG A reverse-transcription based model of intron insertion – almost the same as for intron loss (Fink, 1987) but includes an error of reverse transcription Introns seem to be preferentially lost AND inserted near the 3’-end of the coding region – could there be similar mechanisms for intron loss AND insertion? Role of duplication in the origin of alternative exons has been demonstrated Kondrashov, F.A, Koonin, E.V. Hum. Molec. Genet., 2001 Letunic, I. et al., Hum. Molec. Genet., 2002

National Center for Biotechnology Information Conclusions Evolutionary classification of genes from sequenced genomes (orthologs and paralogs) allows us to address genome-wide evolutionary trends by applying rather straightforward adaptations of known phylogenetic approaches Introns invaded protein-coding genes very early in evolution of eukaryotes - prior to the origin of multicellular forms - and many of these ancient introns survive to this day Remarkable conservation of ancestral introns in some eukaryotic lineages, with as many as 25-30% of the introns in humans and Arabidopsis being apparently inherited from the common ancestor of animals, fungi and plants, and ~30% Plasmodium introns conserved in the crown group. Even the earliest ancestral eukaryotes seem to have had many genes and introns.

National Center for Biotechnology Information Massive gene and intron loss occurred on multiple, independent occasions during eukaryotic evolution, especially in fungi, but also in arthropods and nematodes (and probably many more lineages). Classification of introns by age allows one to follow the evolution of splice signals, intron sequences themselves… and might even suggest mechanisms of intron insertion Lineage-specific expansion of paralogous gene families is accompanied by substantial loss and even more extensive acquisition of introns Loss and gain of introns and genes occur in parallel, reflecting the same lineage-specific trends in genome evolution – perhaps largely dramatic changes in characteristic population sizes entailing changes in selection strength Conclusions

National Center for Biotechnology Information Acknowledgments Igor Rogozin (NCBI) The COG group (NCBI): Yuri Wolf (NCBI) Boris Mirkin (Birkbeck College, London) Alexander Sorokin (NCBI) Alexander Sverdlov (NCBI, now Columbia U) Vladimir Babenko (NCBI) Fyodor Kondrashov (NCBI, now UC Davis) Alexei Kondrashov (NCBI) Natalie D. Fedorova, John D. Jackson, Aviva R. Jacobs, Dmitri M. Krylov, Kira S. Makarova, Raja Mazumder 1, Sergei L. Mekhedov, Anastasia N. Nikolskaya 1, B. Sridhar Rao, Sergei Smirnov, Alexander V. Sverdlov, Roman L. Tatusov, Sona Vasudevan, Jodie J. Yin, Darren A. Natale 1 1 Currently PIR, Georgetown University