Origins and Evolution of Novel Genes

Slides:



Advertisements
Similar presentations
Whole Genome Duplications (Polyploidy) Made famous by S. Ohno, who suggested WGD can be a route to evolutionary innovation (focusing on neofunctionalization)
Advertisements

Basics of Comparative Genomics Dr G. P. S. Raghava.
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
Molecular Clock I. Evolutionary rate Xuhua Xia
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
The origins & evolution of genome complexity Seth Donoughe Lynch & Conery (2003)
Alternative splicing and evolution Daniel Jeffares.
Sequence similarity.
28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Gene Mutations.
Active Lecture Questions for BIOLOGY, Eighth Edition Neil Campbell & Jane Reece Questions prepared by Jung Choi, Georgia Institute of Technology Copyright.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Genome organization Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.
Comparative Genomics of the Eukaryotes
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Origins and impact of constraints in evolution of gene families Boris E. Shakhnovich and Eugene V.Koonin Genome Research 2006, October 19 Stella Veretnik.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Generating Diversity: how genes and genomes evolve Erin “They call me Dr. Worm” Friedman 29 September 2005.
Eukaryotic Gene Control. Developmental pathways of multicellular organisms: All cells of a multicellular organism start with the same complement of DNA.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
The Biology and Genetic Base of Cancer. 2 (Mutation)
Development: differentiating cells to become an organism.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 21 Eukaryotic Genome Sequences
Click to edit Master title style Click to edit Master subtitle style CLICKER QUESTIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry,
whole-genome duplications and large segmental duplications… …seem to be a common feature in eukaryotic genome evolution …play a crucial role in the evolution.
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Gene Regulations and Mutations
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M Genome Res 17: example of complexities observed.
Comparative genomics Haixu Tang School of Informatics.
Using blast to study gene evolution – an example.
Copyright © 2008 Pearson Education, Inc., publishing as Pearson Benjamin Cummings PowerPoint ® Lecture Presentations for Biology Eighth Edition Neil Campbell.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
Chapter 3 The Interrupted Gene.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Opener Chapter 24 – Genome Evolution. Comparative Genomes Powerful tool for exploring evolutionary divergence among organisms Footprints on the evolutionary.
How many genes are there?
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Finding genes in the genome
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used for.
Mutations to Aid in Gene Study By: Yvette Medina Cell Phys
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
LECTURE PRESENTATIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry, Michael L. Cain, Steven A. Wasserman, Peter V. Minorsky, Robert.
Published primate genome sequences - I Published primate genome sequences - II.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Considerations for multi-omics data integration Michael Tress CNIO,
Objective: I can explain how genes jumping between chromosomes can lead to evolution. Chapter 21; Sections ; Pgs Genomes: Connecting.
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used.
Evolution of eukaryotic genomes
Evolution of gene function
Genetics and Evolutionary Biology
Basics of Comparative Genomics
Very important to know the difference between the trees!
Genomes and Their Evolution
Genomes and Their Evolution
SGN23 The Organization of the Human Genome
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Evolution of eukaryote genomes
Chapter 4 The Interrupted Gene.
Volume 2, Issue 5, Pages (November 2012)
Basics of Comparative Genomics
Presentation transcript:

Origins and Evolution of Novel Genes Aoife McLysaght Trinity College Dublin

Novelty novel genes increase complexity perform novel functions role in speciation? consider difficulty of a single gene acquiring a new function that does not hinder the original function compared to the evolution of a new function in an entirely new locus

Promoter exon intron exon

Bricolage Long et al., 2003

Polyploidy – whole genome duplication Aneuploidy – chromosomal duplication Partial chromosome duplication Gene duplication Partial gene duplication Q: Where do new genes come from? A: Other genes. Genes don’t appear by chance from random DNA (or at least, they only do so very rarely)

Gene Duplication Create new genes Generate multigene families / multidomain genes gene duplication and corresponding deletion length of dup/del depends on extent of misalignment **unequal crossing over is facilitated by the presence of repeated sequences ... can get more tandem duplication

Exon/domain shuffling Gene structure Protein domains Domain complexity increases with organismal complexity Rubin et al., Science, 2000

Survivorship/maintenance of gene duplicates may depend on: Duplicability Survivorship/maintenance of gene duplicates may depend on: protein function higher duplicability of metabolic genes in yeast (Marland et al, 2004) network centrality more highly connected proteins have lower duplicability in yeast but higher duplicability in human evolutionary rate higher duplication of slowly evolving genes (Davis and Petrov, 2004) dosage balance dosage-balanced genes are retained after whole genome duplication (WGD) but unlikely to experience small-scale duplication (SSD)

Fate of Duplicated Genes: Examples Neofunctionalisation GLUD2 in primates has a new role in neurotransmitter flux Thrombin (cleaves fibrinogen during clotting) and trypsin (digestive enzyme) are derived from a complete gene duplication Lactate dehydrogenase can be converted into malate dehydrogenase with a single amino acid replacement (out of total protein length of 317 amino acids) Subfunctionalisation SIR3 and ORC1 gene pair in yeast Have divergent functions, but single ancestral-type protein from another yeast has both functions Dosage increase Esterase B in mosquito increased gene dosage confers greater pesticide resistance Functional compensation Many duplicated genes shelter the organism from deleterious mutations in the other copy (shown in yeast and worm)

Functional compensation of duplicate genes Essentiality of duplicated genes Liao BY and Zhang. Trends in Genetics (2007) Duplicate genes usually overlap in function. Nematode Sequence divergence of duplicated genes correlates with their capacity for back up function. Conant GC and Wagner A. Proc. R. Soc. Lond. B (2004)

Polyploidisation Global increase in genome Addition of one or more complete chromosome sets 2 copies : diploid 3 : triploid (sterile) 4 : tetraploid 6 : hexaploid Polyploidisation global increase in genome entire genome duplicated Organism with two copies of every chromosome: diplod three tetraploid (infertile)

Examples of Paleopolyploids Yeast Arabidopsis Wheat Fish Ancestral vertebrate (2R)

Loss or retention of genes duplicated by WGD (ohnologs) Most duplicates are subsequently lost Biased retention of certain classes of genes Retained duplicates are enriched for: Developmental genes Transcription factors Metabolic genes Protein complex membership

Dosage-balance hypothesis Dosage-balanced genes are not robust to gene loss and gene duplication. Gene A Gene B Gene C Gene D Gene E Pathway Pathway Gene A Gene B Gene C Gene D Gene E

Whole genome duplication and dosage-balanced genes Gene A Gene B Gene C Gene D Gene E Pathway Gene A Gene B Gene C Gene D Gene E Pathway WGD duplicates all genes simultaneously and therefore does not perturb relative dosages. Whereas SSD of dosage-balanced genes is likely to be deleterious, WGD should be neutral. Furthermore, once duplicated by WGD they are unlikely to be lost

De novo origins Conversion of 3’ UTR into coding sequence Incorporation of transposable elements into coding sequence

De novo origin of whole protein-coding genes Origin of an open reading frame (ORF) from ancestrally non-coding sequence Single-base substitutions or small indels that remove a stop codon Acquisition of expression activity Considered to be very rare events

New genes in Drosophila Levine et al. 2006, PNAS Five de novo originated genes found in Drosophila melanogaster Begun et al. 2007, Genetics 11 genes that likely appeared in D. yakuba or the D. yakuba / D. erecta ancestor were identified using testis-derived ESTs Testis biased expression Often X-linked Zhou et al., 2008, Genome Research 9 genes (some overlap with previous papers) Estimate 12% of new genes arose de novo

New genes in Saccharomyces Cai et al. 2008, Genetics BSC4 identified as a de novo gene in S. cerevisiae (132 aa) DNA similarity but no ORF in closely-related yeasts S. paradoxus, S. mikatae and S. bayanus Transcibed in these other yeast lineages Origin of protein-coding gene from RNA gene Deletion of DUN1 or RPN4 is lethal if BSC4 is also deleted PeptideAtlas evidence supports translation Purifying selection Possibly involved in the DNA repair pathway

De novo origin of mouse-specific gene Heinen et al., 2009, Current Biology Non-coding RNA gene 3 exons, alternatively spliced Specifically expressed in post-meiotic cells of the testis Indel mutations in 5’ regulatory region Possible selective sweep

Novel primate genes

Human-Chimp Divergence 99% identity of alignable sequence High colinearity of gene order What is the genetic distinction? Regulatory differences? Differential gene duplication and loss? 40-45Mb of species-specific euchromatic sequence Unique genes?

Differential gene duplication and loss Demuth et al, 2006 Hahn et al, 2007

Genome Quality Issues } EnsEMBL family containing the Centaurin Gamma 2 gene Within synteny blocks Out of synteny blocks Hs Chr10 Pt Chr10 Genomic location of the human Chr10 genes } Hs Chr7 Hs Chr10 Genes 225 - 295 Pt Chr2b Hs Chr2 Pt Chr12 Hs Chr12 Hs Chr7 Pt Chr7 Hahn et al 2007 Genetics

De novo origins of monkey genes Toll-Riera et al., (2009) MBE Examined “primate orphans” Protein-coding Present in human and macaque but absent in older lineages

Have new genes arisen de novo recently in the human lineage? This study: Have new genes arisen de novo recently in the human lineage?

Unique human genes? All-against-all BLASTP search identified 644 human genes with no match in the chimp genome Candidate novel genes examine these in great detail

Genome Quality Issues Several spurious/trivial causes of apparent gene gain candidate novel gene is spurious (human genome annotation error) sequence gaps – gene is present but unsequenced Chimp genome annotation error – gene is sequenced but unannotated

? Strategy Synteny-based approach Gene order is conserved between close taxa Regions of conserved gene order are likely to be ancestral The expected location of a gene can be identified and carefully examined Human ? Chimp

Synteny Blocks Blocks with conserved gene order built using unambiguous orthologs: String of orthologs no more than 10 genes apart in either genome. Small local gene order differences permitted.

Expected location definition

644 450 194 3

Novel human protein-coding genes All short ORFS No introns within coding sequence

ORF origins Examine orthologous DNA from chimp and macaque Identify “disablers” - sequence differences that obstruct the ORF Single base differences that cause an early stop Frame-shift inducing indels that result in an early stop codon Absence of a start codon

CLLU1 Chronic Lymphocytic Leukemia Upregulated gene 1 (CLLU1) Start Chronic Lymphocytic Leukemia Upregulated gene 1 (CLLU1) Identified in a search for differentially expressed genes in chronic lymphocytic leukemia (Buhl et al 2006) Located in a EST dense region Overlapping another gene, CLLU1OS, in the opposite strand

Human origin or parallel primate inactivation of ancient gene?

CLLU1 Chronic Lymphocytic Leukemia Upregulated gene 1 (CLLU1) Start Chronic Lymphocytic Leukemia Upregulated gene 1 (CLLU1) Identified in a search for differentially expressed genes in chronic lymphocytic leukemia (Buhl et al 2006) Located in a EST dense region Overlapping another gene, CLLU1OS, in the opposite strand

CLLU1

C22orf45

DNAH10OS

Are these ORFs actually genes? The presence of an ORF does not guarantee that the gene is coding, i.e., that a protein is produced PRIDE PRoteomics IDEntifications is a public database for proteomics data Peptide Atlas Public database of peptides identified by mass spectrometry

Proteomics support

CLLU1 DNAH10OS C22orf45

Human population polymorphism ORF is present intact in all sequenced individuals (public data) No convincing evidence for a selective sweep from published genome-wide scans of HapMap data.

How might these genes arise? Sequence analysis traced the origin of the ORF, but these must also be expressed. Expression of a new gene ENCODE project indicated that much of the genome is transcribed All three of these genes overlap other genes CLLU1 is in a permissive expression environment

3 identified cases under strict criteria De novo genes: Summary 3 identified cases under strict criteria Estimate about 18 should exist All have evidence of transcription and translation ORF formation allowed by human specific mutation in all cases No “re-use” of coding sequence of previously-existing genes, but perhaps re-use of regulatory sequences.

Gene duplication: consequences Innovation Robustness Consequences. NB, NOT causes Neofunctionlisation Functional compensation

Defining essential genes A gene is considered “essential” if its removal results in a lethal or sterile phenotype. Essential genes (Lethal or sterile) Non-essential genes (other phenotypes) Wild type eyeless vestigial Fly http://www.exploratorium.edu Kolodziej PA et al. Neuron (1995) Wild type Mouse foxn1 http://www.crj.co.jp Garacia MU et al. PNAS (2005) Fly: 2540 essential and 5197 non-essential genes Mouse: 2109 essential and 2969 non-essential genes

Evolutionary impact of gene duplications PE - proportion of essential genes Singletons Duplicates count lethal knockouts count lethal knockouts PE singletons PE duplicates >> =

Evolutionary impact of gene and genome duplications PE - proportion of essential genes Singletons Duplicates count lethal knockouts count lethal knockouts PE singletons PE duplicates >> Functional compensation =

Functional compensation of duplicate genes Essentiality of duplicated genes Liao BY and Zhang. Trends in Genetics (2007)

All duplicates are not created equal Whole Genome Duplication (WGD) Small-Scale Duplication (SSD) Differ in extent and frequency Also differ in evolutionary impact??? WGD occurred in yeast, plant and animal. SSD is ongoing. WGD Fly Ascidian Fish Chicken Mouse

Evolutionary impact of gene and genome duplications PE - proportion of essential genes SSD WGD count lethal knockouts count lethal knockouts PE SSD duplicates PE WGD duplicates >> = <<

Essentiality of WGD and SSD duplicated genes in yeast Correlation between sequence divergence and the proportion of essential genes for SSD duplicated genes WGD duplicated genes are less essential than SSD duplicated genes in yeast. Guan Y et al. Genetics (2007)

Defining duplicates and singletons Duplicated genes and singletons All-against-all blastp search for mouse (fly) (Ensembl 50) E-value threshold: e-20 WGD duplicated genes in mouse Human WGD duplicated genes One-to-one orthology (Ensembl 50) Mouse WGD duplicated genes SSD duplicated genes in mouse All duplicated genes excluding WGD duplicated genes

PE for WGD and SSD genes in mouse KA (SSD: R = 0.94, P = 0.017) SSD duplicated genes (38.1%) < singletons (42.2%) (P = 0.027) SSD duplicated genes (38.1%) < WGD duplicated genes (45.4%) (P = 3.1 x 10-6, χ2 test) No difference in essentiality between WGD duplicated genes (45.4%) and singletons (42.2%) (P = 0.10) SSD duplicated genes carry out the expected backup role, but WGD duplicated genes are equally as essential as singletons in mammalian genomes.

PE for developmental genes Duplicate developmental genes created by WGD were preferentially retained in vertebrate genomes. Blomme T et al. Genome Biol. (2006) Developmental genes: genes with GO:0007525 (multicellular organismal development) or GO:0030154 (cell differentiation) Developmental genes in mouse: singletons < duplicated genes (P = 0.0086, χ2 test) Non-developmental genes in mouse and fly: singletons > duplicated genes (Mouse: P = 0.00051) (Fly: P = 2.8 x 10-8) Developmental genes in fly: Singletons ≈ duplicated genes (P = 0.98) Non-developmental genes: genes with other GO ids

Functional compensation of duplicate genes Essentiality of duplicated genes Liao BY and Zhang. Trends in Genetics (2007) Data bias: Developmental genes represent 37% of knockout data but only 11% of genome

Why is the essentiality of WGD genes high? Dosage balance hypothesis: Gene A Gene B Gene C Gene D Gene E Pathway Gene A Gene B Gene C Gene D Gene E Pathway Gene A Gene B Gene C Gene D Gene E Pathway Gene A Gene B Gene C Gene D Gene E Pathway WGD creates a unique opportunity for the duplication of dosage-balanced genes

Known categories of dosage-balanced genes Developmental genes, transcription factors and protein complex members 1. Enrichment of developmental genes WGD duplicated genes in our dataset are significantly enriched for the functional category ‘transcription regulator activity’. 2. Enrichment of transcription factors WGD duplicated genes (21.8%; 388/1781) vs. Total dataset (17.9%; 910/5078) 3. Enrichment of protein complex membership (P = 0.00039, χ2 test) WGD duplicated genes are likely to be dosage-balanced genes.

Dosage balance hypothesis Test: Are ohnologs refractory to changes in dosage? SSD Individual gene duplication within the vertebrate lineage CNV (Copy Number Variation) Recent (polymorphic) gene duplication in human populations

Recent SSD (within the tetrapod lineage) Reconstructed “tetrapod gene families” based on inferred gene complement just after fish-tetrapod split Two categories of family Containing ohnologs Not containing ohnologs Count fraction of families that include at least one SSD event

Ohnologs are less likely to experience SSD Along the human lineage 6.7% of ohnolog families have experienced subsequent SSD 10.1% of other genes duplicated in the same time period (P = 4.8 x 10-15)

Resistance to SSD predates WGD event In pre-WGD lineages Ascidian singletons (no lineage-specific SSD) are more likely to be orthologs of human ohnologs (30.1%; 1804/5998) than ascidian duplicates (20.6%; 649/3147; P < 2.2 x 10-16). Similarly for fly, worm and sea anemone Fly Ascidian Fish Chicken Human

Ohnologs also less likely to experience CNV PCNV = Proportion of genes with copy number variation Genome average = 29.3%, 6136/20907 Ohnologs = 22.6%, 1648/7294 SSD paralogs = 36.6%, 3306/9027 Ohnologs are unlikely to experience CNV whereas SSD-paralogs are likely to also display CNV

Many ohnologs are dosage-balanced Retained ohnologs are resistant to duplication (SSD or CNV), even in distantly-related lineages that did not experience WGD. Over 60% of ohnologs (4638/7294) are free of subsequent SSD and CNV These are dosage-balanced ohnologs (DBOs)

DBOs are associated with disease Data used to search for CNV was from healthy individuals Studies have reported a link between CNV and disease Duplication of a DBO is expected to be deleterious and lead to disease DBOs identified here are enriched for disease genes in OMIM (P < 2.2x10-16)

Trisomy 21 – Down’s Syndrome Extreme example of CNV CNV of an entire chromosome 1.5-fold increase in dosage of some chr 21 genes results in Down’s Syndrome Most commonly observed human trisomy 1/1000 individuals Other trisomy mutations occur, but are lethal. Trisomy 21 has the least severe phenotypic consequences. DBOs are significantly under-represented on chr 21 (obs. 40 vs. exp. 56.1, P=0.010)

Trisomy 21 candidate genes 75% (12/16) of reported DS candidates are also DBOs, which is significantly more than expected (2; P = 5.9 x 10-8)

Conclusions: novel genes We show the first evidence of de novo formation of unique human genes Newly formed genes show simple ORFs Regions/tissues with permissive expression environments may favor this process SSD duplicated genes have a backup role for their duplicated copies in mouse and in fly. WGD-duplicated genes do not have a backup role and have high essentiality, because of the enrichment of dosage-balanced genes. The evolutionary profile of gene duplication and retention can suggest a role in human disease.

Acknowledgements David Gonzalez Knowles Takashi Makino Knowles & McLysaght, Genome Research (2009) Takashi Makino Makino et al., TiG (2009) and Makino & McLysaght PNAS (2010)

Human-chimp sequence divergence Total (and nonsynonymous) base subsitutions pooled over all three genes 5 (2) 12 substitutions Use macaque to orient the changes 5 in human, 7 in chimp In human, 2 non-synonymous In chimp, 3 non-synonymous-like 7 (3)