Presentation on theme: "Origins and Evolution of Novel Genes"— Presentation transcript:
1Origins and Evolution of Novel Genes Aoife McLysaghtTrinity College Dublin
2Novelty novel genes increase complexity perform novel functions role in speciation?consider difficulty of a single gene acquiring a new function that does not hinder the original function compared to the evolution of a new function in an entirely new locus
5Polyploidy – whole genome duplication Aneuploidy – chromosomal duplicationPartial chromosome duplicationGene duplicationPartial gene duplicationQ: Where do new genes come from?A: Other genes.Genes don’t appear by chance from random DNA (or at least, they only do so very rarely)
6Gene Duplication Create new genes Generate multigene families / multidomain genesgene duplication and corresponding deletionlength of dup/del depends on extent of misalignment**unequal crossing over is facilitated by the presence of repeated sequences ... can get more tandem duplication
7Exon/domain shuffling GenestructureProtein domainsDomain complexity increases with organismal complexityRubin et al., Science, 2000
8Survivorship/maintenance of gene duplicates may depend on: DuplicabilitySurvivorship/maintenance of gene duplicates may depend on:protein functionhigher duplicability of metabolic genes in yeast (Marland et al, 2004)network centralitymore highly connected proteins have lower duplicability in yeast but higher duplicability in humanevolutionary ratehigher duplication of slowly evolving genes (Davis and Petrov, 2004)dosage balancedosage-balanced genes are retained after whole genome duplication (WGD) but unlikely to experience small-scale duplication (SSD)
9Fate of Duplicated Genes: Examples NeofunctionalisationGLUD2 in primates has a new role in neurotransmitter fluxThrombin (cleaves fibrinogen during clotting) and trypsin (digestive enzyme) are derived from a complete gene duplicationLactate dehydrogenase can be converted into malate dehydrogenase with a single amino acid replacement (out of total protein length of 317 amino acids)SubfunctionalisationSIR3 and ORC1 gene pair in yeastHave divergent functions, but single ancestral-type protein from another yeast has both functionsDosage increaseEsterase B in mosquitoincreased gene dosage confers greater pesticide resistanceFunctional compensationMany duplicated genes shelter the organism from deleterious mutations in the other copy (shown in yeast and worm)
10Functional compensation of duplicate genes Essentiality of duplicated genesLiao BY and Zhang. Trends in Genetics (2007)Duplicate genes usually overlap in function.NematodeSequence divergence of duplicated genes correlates with their capacity for back up function.Conant GC and Wagner A. Proc. R. Soc. Lond. B (2004)
11Polyploidisation Global increase in genome Addition of one or more complete chromosome sets2 copies : diploid3 : triploid (sterile)4 : tetraploid6 : hexaploidPolyploidisationglobal increase in genomeentire genome duplicatedOrganism with two copies of every chromosome: diplodthree tetraploid (infertile)
12Examples of Paleopolyploids YeastArabidopsisWheatFishAncestral vertebrate (2R)
13Loss or retention of genes duplicated by WGD (ohnologs) Most duplicates are subsequently lostBiased retention of certain classes of genesRetained duplicates are enriched for:Developmental genesTranscription factorsMetabolic genesProtein complex membership
14Dosage-balance hypothesis Dosage-balanced genes are not robust to gene loss and gene duplication.Gene AGene BGene CGene DGene EPathwayPathwayGene AGene BGene CGene DGene E
15Whole genome duplication and dosage-balanced genes Gene AGene BGene CGene DGene EPathwayGene AGene BGene CGene DGene EPathwayWGD duplicates all genes simultaneously and therefore does not perturb relative dosages. Whereas SSD of dosage-balanced genes is likely to be deleterious, WGD should be neutral. Furthermore, once duplicated by WGD they are unlikely to be lost
16De novo origins Conversion of 3’ UTR into coding sequence Incorporation of transposable elements into coding sequence
17De novo origin of whole protein-coding genes Origin of an open reading frame (ORF) from ancestrally non-coding sequenceSingle-base substitutions or small indels that remove a stop codonAcquisition of expression activityConsidered to be very rare events
18New genes in Drosophila Levine et al. 2006, PNASFive de novo originated genes found in Drosophila melanogasterBegun et al. 2007, Genetics11 genes that likely appeared in D. yakuba or the D. yakuba / D. erecta ancestor were identified using testis-derived ESTsTestis biased expressionOften X-linkedZhou et al., 2008, Genome Research9 genes (some overlap with previous papers)Estimate 12% of new genes arose de novo
19New genes in Saccharomyces Cai et al. 2008, GeneticsBSC4 identified as a de novo gene in S. cerevisiae (132 aa)DNA similarity but no ORF in closely-related yeasts S. paradoxus, S. mikatae and S. bayanusTranscibed in these other yeast lineagesOrigin of protein-coding gene from RNA geneDeletion of DUN1 or RPN4 is lethal if BSC4 is also deletedPeptideAtlas evidence supports translationPurifying selectionPossibly involved in the DNA repair pathway
20De novo origin of mouse-specific gene Heinen et al., 2009, Current BiologyNon-coding RNA gene3 exons, alternatively splicedSpecifically expressed in post-meiotic cells of the testisIndel mutations in 5’ regulatory regionPossible selective sweep
22Human-Chimp Divergence 99% identity of alignable sequenceHigh colinearity of gene orderWhat is the genetic distinction?Regulatory differences?Differential gene duplication and loss?40-45Mb of species-specific euchromatic sequenceUnique genes?
23Differential gene duplication and loss Demuth et al, 2006Hahn et al, 2007
24Genome Quality Issues } EnsEMBL family containing the Centaurin Gamma 2geneWithin synteny blocksOut of synteny blocksHs Chr10Pt Chr10Genomic location of the human Chr10 genes}Hs Chr7Hs Chr10GenesPt Chr2bHs Chr2Pt Chr12Hs Chr12Hs Chr7Pt Chr7Hahn et al 2007 Genetics
25De novo origins of monkey genes Toll-Riera et al., (2009) MBEExamined “primate orphans”Protein-codingPresent in human and macaque but absent in older lineages
26Have new genes arisen de novo recently in the human lineage? This study:Have new genes arisen de novo recently in the human lineage?
27Unique human genes?All-against-all BLASTP search identified 644 human genes with no match in the chimp genomeCandidate novel genesexamine these in great detail
28Genome Quality IssuesSeveral spurious/trivial causes of apparent gene gaincandidate novel gene is spurious (human genome annotation error)sequence gaps – gene is present but unsequencedChimp genome annotation error – gene is sequenced but unannotated
29? Strategy Synteny-based approach Gene order is conserved between close taxaRegions of conserved gene order are likely to be ancestralThe expected location of a gene can be identified and carefully examinedHuman?Chimp
30Synteny BlocksBlocks with conserved gene order built using unambiguous orthologs:String of orthologs no more than 10 genes apart in either genome.Small local gene order differences permitted.
33Novel human protein-coding genes All short ORFSNo introns within coding sequence
34ORF origins Examine orthologous DNA from chimp and macaque Identify “disablers” - sequence differences that obstruct the ORFSingle base differences that cause an early stopFrame-shift inducing indels that result in an early stop codonAbsence of a start codon
35CLLU1 Chronic Lymphocytic Leukemia Upregulated gene 1 (CLLU1) StartChronic Lymphocytic Leukemia Upregulated gene 1 (CLLU1)Identified in a search for differentially expressed genes in chronic lymphocytic leukemia (Buhl et al 2006)Located in a EST dense regionOverlapping another gene, CLLU1OS, in the opposite strand
36Human origin or parallel primate inactivation of ancient gene?
37CLLU1 Chronic Lymphocytic Leukemia Upregulated gene 1 (CLLU1) StartChronic Lymphocytic Leukemia Upregulated gene 1 (CLLU1)Identified in a search for differentially expressed genes in chronic lymphocytic leukemia (Buhl et al 2006)Located in a EST dense regionOverlapping another gene, CLLU1OS, in the opposite strand
41Are these ORFs actually genes? The presence of an ORF does not guarantee that the gene is coding, i.e., that a protein is producedPRIDEPRoteomics IDEntifications is a public database for proteomics dataPeptide AtlasPublic database of peptides identified by mass spectrometry
44Human population polymorphism ORF is present intact in all sequenced individuals (public data)No convincing evidence for a selective sweep from published genome-wide scans of HapMap data.
45How might these genes arise? Sequence analysis traced the origin of the ORF, but these must also be expressed.Expression of a new geneENCODE project indicated that much of the genome is transcribedAll three of these genes overlap other genesCLLU1 is in a permissive expression environment
463 identified cases under strict criteria De novo genes: Summary3 identified cases under strict criteriaEstimate about 18 should existAll have evidence of transcription and translationORF formation allowed by human specific mutation in all casesNo “re-use” of coding sequence of previously-existing genes, but perhaps re-use of regulatory sequences.
47Gene duplication: consequences InnovationRobustnessConsequences. NB, NOT causesNeofunctionlisationFunctional compensation
48Defining essential genes A gene is considered “essential” if its removal results in a lethal or sterile phenotype.Essential genes(Lethal or sterile)Non-essential genes(other phenotypes)Wild typeeyelessvestigialFlyKolodziej PA et al. Neuron (1995)Wild typeMousefoxn1Garacia MU et al. PNAS (2005)Fly: 2540 essential and 5197 non-essential genesMouse: 2109 essential and 2969 non-essential genes
49Evolutionary impact of gene duplications PE - proportion of essential genesSingletonsDuplicatescount lethalknockoutscount lethalknockoutsPE singletonsPE duplicates>>=
50Evolutionary impact of gene and genome duplications PE - proportion of essential genesSingletonsDuplicatescount lethalknockoutscount lethalknockoutsPE singletonsPE duplicates>>Functional compensation=
51Functional compensation of duplicate genes Essentiality of duplicated genesLiao BY and Zhang. Trends in Genetics (2007)
52All duplicates are not created equal Whole Genome Duplication (WGD)Small-Scale Duplication (SSD)Differ in extent and frequencyAlso differ in evolutionary impact???WGD occurred in yeast, plant and animal.SSD is ongoing.WGDFlyAscidianFishChickenMouse
53Evolutionary impact of gene and genome duplications PE - proportion of essential genesSSDWGDcount lethalknockoutscount lethalknockoutsPE SSD duplicatesPE WGD duplicates>>=<<
54Essentiality of WGD and SSD duplicated genes in yeast Correlation between sequence divergence and the proportion of essential genes for SSD duplicated genesWGD duplicated genes are less essential than SSD duplicated genes in yeast.Guan Y et al. Genetics (2007)
55Defining duplicates and singletons Duplicated genes and singletonsAll-against-all blastp search for mouse (fly) (Ensembl 50)E-value threshold: e-20WGD duplicated genes in mouseHuman WGD duplicated genesOne-to-one orthology (Ensembl 50)Mouse WGD duplicated genesSSD duplicated genes in mouseAll duplicated genes excluding WGD duplicated genes
56PE for WGD and SSD genes in mouse KA(SSD: R = 0.94, P = 0.017)SSD duplicated genes (38.1%)< singletons (42.2%)(P = 0.027)SSD duplicated genes (38.1%)< WGD duplicated genes (45.4%)(P = 3.1 x 10-6, χ2 test)No difference in essentiality between WGD duplicated genes (45.4%)and singletons (42.2%)(P = 0.10)SSD duplicated genes carry out the expected backup role, but WGD duplicated genes are equally as essential as singletons in mammalian genomes.
57PE for developmental genes Duplicate developmental genes created by WGD were preferentially retained in vertebrate genomes.Blomme T et al. Genome Biol. (2006)Developmental genes: genes with GO: (multicellular organismal development) or GO: (cell differentiation)Developmental genes in mouse:singletons < duplicated genes(P = , χ2 test)Non-developmental genes in mouse and fly:singletons > duplicated genes(Mouse: P = )(Fly: P = 2.8 x 10-8)Developmental genes in fly:Singletons ≈ duplicated genes(P = 0.98)Non-developmental genes: genes with other GO ids
58Functional compensation of duplicate genes Essentiality of duplicated genesLiao BY and Zhang. Trends in Genetics (2007)Data bias:Developmental genes represent 37% of knockout data but only 11% of genome
59Why is the essentiality of WGD genes high? Dosage balance hypothesis:Gene AGene BGene CGene DGene EPathwayGene AGene BGene CGene DGene EPathwayGene AGene BGene CGene DGene EPathwayGene AGene BGene CGene DGene EPathwayWGD creates a unique opportunity for the duplication of dosage-balanced genes
60Known categories of dosage-balanced genes Developmental genes, transcription factors and protein complex members1. Enrichment of developmental genesWGD duplicated genes in our dataset are significantly enriched for the functional category ‘transcription regulator activity’.2. Enrichment of transcription factorsWGD duplicated genes (21.8%; 388/1781)vs. Total dataset (17.9%; 910/5078)3. Enrichment of protein complex membership(P = , χ2 test)WGD duplicated genes are likely to be dosage-balanced genes.
61Dosage balance hypothesis Test: Are ohnologs refractory to changes in dosage?SSDIndividual gene duplication within the vertebrate lineageCNV (Copy Number Variation)Recent (polymorphic) gene duplication in human populations
62Recent SSD (within the tetrapod lineage) Reconstructed “tetrapod gene families” based on inferred gene complement just after fish-tetrapod splitTwo categories of familyContaining ohnologsNot containing ohnologsCount fraction of families that include at least one SSD event
63Ohnologs are less likely to experience SSD Along the human lineage6.7% of ohnolog families have experienced subsequent SSD10.1% of other genes duplicated in the same time period (P = 4.8 x 10-15)
64Resistance to SSD predates WGD event In pre-WGD lineagesAscidian singletons (no lineage-specific SSD) are more likely to be orthologs of human ohnologs (30.1%; 1804/5998) than ascidian duplicates (20.6%; 649/3147; P < 2.2 x 10-16).Similarly for fly, worm and sea anemoneFlyAscidianFishChickenHuman
65Ohnologs also less likely to experience CNV PCNV = Proportion of genes with copy number variationGenome average = 29.3%, 6136/20907Ohnologs = 22.6%, 1648/7294SSD paralogs = 36.6%, 3306/9027Ohnologs are unlikely to experience CNV whereas SSD-paralogs are likely to also display CNV
66Many ohnologs are dosage-balanced Retained ohnologs are resistant to duplication (SSD or CNV), even in distantly-related lineages that did not experience WGD.Over 60% of ohnologs (4638/7294) are free of subsequent SSD and CNVThese are dosage-balanced ohnologs (DBOs)
67DBOs are associated with disease Data used to search for CNV was from healthy individualsStudies have reported a link between CNV and diseaseDuplication of a DBO is expected to be deleterious and lead to diseaseDBOs identified here are enriched for disease genes in OMIM (P < 2.2x10-16)
68Trisomy 21 – Down’s Syndrome Extreme example of CNVCNV of an entire chromosome1.5-fold increase in dosage of some chr 21 genes results in Down’s SyndromeMost commonly observed human trisomy1/1000 individualsOther trisomy mutations occur, but are lethal. Trisomy 21 has the least severe phenotypic consequences.DBOs are significantly under-represented on chr 21 (obs. 40 vs. exp. 56.1, P=0.010)
69Trisomy 21 candidate genes 75% (12/16) of reported DS candidates are also DBOs, which is significantly more than expected (2; P = 5.9 x 10-8)
70Conclusions: novel genes We show the first evidence of de novo formation of unique human genesNewly formed genes show simple ORFsRegions/tissues with permissive expression environments may favor this processSSD duplicated genes have a backup role for their duplicated copies in mouse and in fly.WGD-duplicated genes do not have a backup role and have high essentiality, because of the enrichment of dosage-balanced genes.The evolutionary profile of gene duplication and retention can suggest a role in human disease.
71Acknowledgements David Gonzalez Knowles Takashi Makino Knowles & McLysaght, Genome Research (2009)Takashi MakinoMakino et al., TiG (2009) and Makino & McLysaght PNAS (2010)
72Human-chimp sequence divergence Total (and nonsynonymous) base subsitutions pooled over all three genes5 (2)12 substitutionsUse macaque to orient the changes5 in human, 7 in chimpIn human, 2 non-synonymousIn chimp, 3 non-synonymous-like7 (3)