Presentation is loading. Please wait.

Presentation is loading. Please wait.

Origins and Evolution of Novel Genes Aoife McLysaght Trinity College Dublin.

Similar presentations


Presentation on theme: "Origins and Evolution of Novel Genes Aoife McLysaght Trinity College Dublin."— Presentation transcript:

1 Origins and Evolution of Novel Genes Aoife McLysaght Trinity College Dublin

2 Novelty

3 Promoter exon intron

4 Bricolage Long et al., 2003

5 Duplication 1.Polyploidy – whole genome duplication 2.Aneuploidy – chromosomal duplication 3.Partial chromosome duplication 4.Gene duplication 5.Partial gene duplication

6 Gene Duplication  Create new genes  Generate multigene families / multidomain genes

7 Exon/domain shuffling Domain complexity increases with organismal complexity Rubin et al., Science, 2000 Gene structure Protein domains

8 Duplicability  Survivorship/maintenance of gene duplicates may depend on: –protein function  higher duplicability of metabolic genes in yeast (Marland et al, 2004) –network centrality  more highly connected proteins have lower duplicability in yeast but higher duplicability in human –evolutionary rate  higher duplication of slowly evolving genes (Davis and Petrov, 2004) –dosage balance  dosage-balanced genes are retained after whole genome duplication (WGD) but unlikely to experience small-scale duplication (SSD)

9 Fate of Duplicated Genes: Examples  Neofunctionalisation –GLUD2 in primates has a new role in neurotransmitter flux –Thrombin (cleaves fibrinogen during clotting) and trypsin (digestive enzyme) are derived from a complete gene duplication –Lactate dehydrogenase can be converted into malate dehydrogenase with a single amino acid replacement (out of total protein length of 317 amino acids)  Subfunctionalisation –SIR3 and ORC1 gene pair in yeast  Have divergent functions, but single ancestral-type protein from another yeast has both functions  Dosage increase –Esterase B in mosquito  increased gene dosage confers greater pesticide resistance  Functional compensation  Many duplicated genes shelter the organism from deleterious mutations in the other copy (shown in yeast and worm)

10 Essentiality of duplicated genes Duplicate genes usually overlap in function. Liao BY and Zhang. Trends in Genetics (2007) Sequence divergence of duplicated genes correlates with their capacity for back up function. Conant GC and Wagner A. Proc. R. Soc. Lond. B (2004) Nematode Functional compensation of duplicate genes

11 Polyploidisation  Global increase in genome  Addition of one or more complete chromosome sets  2 copies : diploid  3: triploid (sterile)  4: tetraploid  6: hexaploid

12 Examples of Paleopolyploids  Yeast  Arabidopsis  Wheat  Fish  Ancestral vertebrate (2R)

13 Loss or retention of genes duplicated by WGD (ohnologs)  Most duplicates are subsequently lost  Biased retention of certain classes of genes  Retained duplicates are enriched for: –Developmental genes –Transcription factors –Metabolic genes –Protein complex membership

14 Dosage-balance hypothesis Dosage-balanced genes are not robust to gene loss and gene duplication. Gene A Gene B Gene C Gene D Gene E Pathway Gene A Gene B Gene C Gene D Gene E Pathway

15 Whole genome duplication and dosage-balanced genes WGD duplicates all genes simultaneously and therefore does not perturb relative dosages. Whereas SSD of dosage-balanced genes is likely to be deleterious, WGD should be neutral. Furthermore, once duplicated by WGD they are unlikely to be lost Gene A Gene B Gene C Gene D Gene E Pathway Gene A Gene B Gene C Gene D Gene E Pathway

16 De novo origins Conversion of 3’ UTR into coding sequence Incorporation of transposable elements into coding sequence

17 De novo origin of whole protein-coding genes  Origin of an open reading frame (ORF) from ancestrally non-coding sequence –Single-base substitutions or small indels that remove a stop codon  Acquisition of expression activity  Considered to be very rare events

18 New genes in Drosophila  Levine et al. 2006, PNAS –Five de novo originated genes found in Drosophila melanogaster  Begun et al. 2007, Genetics –11 genes that likely appeared in D. yakuba or the D. yakuba / D. erecta ancestor were identified using testis-derived ESTs  Testis biased expression  Often X-linked  Zhou et al., 2008, Genome Research –9 genes (some overlap with previous papers) –Estimate 12% of new genes arose de novo

19 New genes in Saccharomyces  Cai et al. 2008, Genetics –BSC4 identified as a de novo gene in S. cerevisiae (132 aa) –DNA similarity but no ORF in closely-related yeasts S. paradoxus, S. mikatae and S. bayanus –Transcibed in these other yeast lineages  Origin of protein-coding gene from RNA gene –Deletion of DUN1 or RPN4 is lethal if BSC4 is also deleted –PeptideAtlas evidence supports translation –Purifying selection –Possibly involved in the DNA repair pathway

20 De novo origin of mouse-specific gene  Heinen et al., 2009, Current Biology  Non-coding RNA gene  3 exons, alternatively spliced  Specifically expressed in post-meiotic cells of the testis  Indel mutations in 5’ regulatory region  Possible selective sweep

21 Novel primate genes

22 Human-Chimp Divergence  99% identity of alignable sequence  High colinearity of gene order What is the genetic distinction?  Regulatory differences?  Differential gene duplication and loss? –40-45Mb of species-specific euchromatic sequence  Unique genes?

23 Differential gene duplication and loss Demuth et al, 2006 Hahn et al, 2007

24 Genome Quality Issues Hahn et al 2007 Genetics EnsEMBL family containing the Centaurin Gamma 2 gene Hs Chr7 Pt Chr7 Within synteny blocks Out of synteny blocks Pt Chr12 Hs Chr12 Pt Chr2b Hs Chr2 Hs Chr10 Hs Chr7 Pt Chr10 Hs Chr10 Genomic location of the human Chr10 genes }

25 De novo origins of monkey genes  Toll-Riera et al., (2009) MBE  Examined “primate orphans” –Protein-coding –Present in human and macaque but absent in older lineages

26 This study: Have new genes arisen de novo recently in the human lineage?

27 Unique human genes?  All-against-all BLASTP search identified 644 human genes with no match in the chimp genome  Candidate novel genes –examine these in great detail

28 Genome Quality Issues  Several spurious/trivial causes of apparent gene gain –candidate novel gene is spurious (human genome annotation error) –sequence gaps – gene is present but unsequenced –Chimp genome annotation error – gene is sequenced but unannotated

29 Strategy  Synteny-based approach –Gene order is conserved between close taxa –Regions of conserved gene order are likely to be ancestral  The expected location of a gene can be identified and carefully examined Human Chimp

30 Synteny Blocks  Blocks with conserved gene order built using unambiguous orthologs: –String of orthologs no more than 10 genes apart in either genome. –Small local gene order differences permitted.

31 Expected location definition

32

33 Novel human protein-coding genes All short ORFS No introns within coding sequence

34 ORF origins  Examine orthologous DNA from chimp and macaque  Identify “disablers” - sequence differences that obstruct the ORF –Single base differences that cause an early stop –Frame-shift inducing indels that result in an early stop codon –Absence of a start codon

35 CLLU1 Chronic Lymphocytic Leukemia Upregulated gene 1 (CLLU1) Identified in a search for differentially expressed genes in chronic lymphocytic leukemia (Buhl et al 2006) Located in a EST dense region Overlapping another gene, CLLU1OS, in the opposite strand Start

36 Human origin or parallel primate inactivation of ancient gene?

37 CLLU1 Chronic Lymphocytic Leukemia Upregulated gene 1 (CLLU1) Identified in a search for differentially expressed genes in chronic lymphocytic leukemia (Buhl et al 2006) Located in a EST dense region Overlapping another gene, CLLU1OS, in the opposite strand Start

38 CLLU1

39 C22orf45

40 DNAH10OS

41 Are these ORFs actually genes?  The presence of an ORF does not guarantee that the gene is coding, i.e., that a protein is produced  PRIDE –PRoteomics IDEntifications is a public database for proteomics data  Peptide Atlas –Public database of peptides identified by mass spectrometry

42 Proteomics support

43 DNAH10OS CLLU1 C22orf45

44 Human population polymorphism  ORF is present intact in all sequenced individuals (public data)  No convincing evidence for a selective sweep from published genome-wide scans of HapMap data.

45 How might these genes arise?  Sequence analysis traced the origin of the ORF, but these must also be expressed.  Expression of a new gene –ENCODE project indicated that much of the genome is transcribed –All three of these genes overlap other genes –CLLU1 is in a permissive expression environment

46 De novo genes: Summary  3 identified cases under strict criteria –Estimate about 18 should exist  All have evidence of transcription and translation  ORF formation allowed by human specific mutation in all cases  No “re-use” of coding sequence of previously-existing genes, but perhaps re-use of regulatory sequences.

47 Gene duplication: consequences InnovationRobustness NeofunctionlisationFunctional compensation

48 A gene is considered “essential” if its removal results in a lethal or sterile phenotype. Fly Mouse Essential genes (Lethal or sterile) Kolodziej PA et al. Neuron (1995) Garacia MU et al. PNAS (2005) Non-essential genes (other phenotypes) Wild type eyelessvestigial foxn1 Fly: 2540 essential and 5197 non-essential genes Mouse: 2109 essential and 2969 non-essential genes Defining essential genes

49 Evolutionary impact of gene duplications P E - proportion of essential genes SingletonsDuplicates count lethal knockouts count lethal knockouts P E singletonsP E duplicates >> =

50 Evolutionary impact of gene and genome duplications P E - proportion of essential genes SingletonsDuplicates count lethal knockouts count lethal knockouts P E singletonsP E duplicates >> = Functional compensation

51 Essentiality of duplicated genes Liao BY and Zhang. Trends in Genetics (2007) Functional compensation of duplicate genes

52 All duplicates are not created equal  Whole Genome Duplication (WGD)  Small-Scale Duplication (SSD)  Differ in extent and frequency  Also differ in evolutionary impact??? WGD occurred in yeast, plant and animal. MouseFlyAscidianFishChicken WGD SSD is ongoing.

53 Evolutionary impact of gene and genome duplications P E - proportion of essential genes SSDWGD count lethal knockouts count lethal knockouts P E SSD duplicates P E WGD duplicates >> = <<

54 WGD duplicated genes are less essential than SSD duplicated genes in yeast. Guan Y et al. Genetics (2007) Correlation between sequence divergence and the proportion of essential genes for SSD duplicated genes Yeast Essentiality of WGD and SSD duplicated genes in yeast

55 WGD duplicated genes in mouse SSD duplicated genes in mouse Human WGD duplicated genes Mouse WGD duplicated genes One-to-one orthology (Ensembl 50) Duplicated genes and singletons All-against-all blastp search for mouse (fly) (Ensembl 50) E-value threshold: e -20 All duplicated genes excluding WGD duplicated genes Defining duplicates and singletons

56 No difference in essentiality between WGD duplicated genes (45.4%) and singletons (42.2%) SSD duplicated genes carry out the expected backup role, but WGD duplicated genes are equally as essential as singletons in mammalian genomes. SSD duplicated genes (38.1%) < WGD duplicated genes (45.4%) (P = 3.1 x 10 -6, χ 2 test) SSD duplicated genes (38.1%) < singletons (42.2%) (P = 0.027) (P = 0.10) KAKA (SSD: R = 0.94, P = 0.017) P E for WGD and SSD genes in mouse

57 Developmental genes: genes with GO: (multicellular organismal development) or GO: (cell differentiation) Non-developmental genes: genes with other GO ids Duplicate developmental genes created by WGD were preferentially retained in vertebrate genomes. Blomme T et al. Genome Biol. (2006) Developmental genes in fly: Singletons ≈ duplicated genes (P = 0.98) Non-developmental genes in mouse and fly: singletons > duplicated genes (Mouse: P = ) (Fly: P = 2.8 x ) Developmental genes in mouse: singletons < duplicated genes (P = , χ 2 test) P E for developmental genes

58 Essentiality of duplicated genes Liao BY and Zhang. Trends in Genetics (2007) Functional compensation of duplicate genes Data bias: Developmental genes represent 37% of knockout data but only 11% of genome

59 WGD creates a unique opportunity for the duplication of dosage- balanced genes Dosage balance hypothesis: Gene A Gene B Gene C Gene D Gene E Pathway Gene A Gene B Gene C Gene D Gene E Pathway Gene A Gene B Gene C Gene D Gene E Pathway Gene A Gene B Gene C Gene D Gene E Pathway Why is the essentiality of WGD genes high?

60 Dosage balanced genes: 1. Enrichment of developmental genes WGD duplicated genes in our dataset are significantly enriched for the functional category ‘transcription regulator activity’. 2. Enrichment of transcription factors Developmental genes, transcription factors and protein complex members WGD duplicated genes are likely to be dosage-balanced genes. WGD duplicated genes (21.8%; 388/1781) vs. Total dataset (17.9%; 910/5078) 3. Enrichment of protein complex membership (P = , χ 2 test) Known categories of dosage-balanced genes

61 Test: Are ohnologs refractory to changes in dosage? SSD Individual gene duplication within the vertebrate lineage CNV (Copy Number Variation) Recent (polymorphic) gene duplication in human populations Dosage balance hypothesis

62 Recent SSD (within the tetrapod lineage)  Reconstructed “tetrapod gene families” based on inferred gene complement just after fish-tetrapod split  Two categories of family –Containing ohnologs –Not containing ohnologs  Count fraction of families that include at least one SSD event

63 Along the human lineage 6.7% of ohnolog families have experienced subsequent SSD 10.1% of other genes duplicated in the same time period (P = 4.8 x ) Ohnologs are less likely to experience SSD

64 Resistance to SSD predates WGD event In pre-WGD lineages Ascidian singletons (no lineage-specific SSD) are more likely to be orthologs of human ohnologs (30.1%; 1804/5998) than ascidian duplicates (20.6%; 649/3147; P < 2.2 x 10-16). Similarly for fly, worm and sea anemone FlyAscidianFishChicken Human

65 P CNV = Proportion of genes with copy number variation Genome average = 29.3%, 6136/20907 Ohnologs = 22.6%, 1648/7294 SSD paralogs = 36.6%, 3306/9027 Ohnologs are unlikely to experience CNV whereas SSD- paralogs are likely to also display CNV Ohnologs also less likely to experience CNV

66 Over 60% of ohnologs (4638/7294) are free of subsequent SSD and CNV These are dosage-balanced ohnologs (DBOs) Retained ohnologs are resistant to duplication (SSD or CNV), even in distantly-related lineages that did not experience WGD. Many ohnologs are dosage-balanced

67 DBOs are associated with disease  Data used to search for CNV was from healthy individuals  Studies have reported a link between CNV and disease  Duplication of a DBO is expected to be deleterious and lead to disease  DBOs identified here are enriched for disease genes in OMIM (P < 2.2x )

68 Trisomy 21 – Down’s Syndrome  Extreme example of CNV –CNV of an entire chromosome  1.5-fold increase in dosage of some chr 21 genes results in Down’s Syndrome  Most commonly observed human trisomy –1/1000 individuals  Other trisomy mutations occur, but are lethal. Trisomy 21 has the least severe phenotypic consequences.  DBOs are significantly under-represented on chr 21 (obs. 40 vs. exp. 56.1, P=0.010)

69 75% (12/16) of reported DS candidates are also DBOs, which is significantly more than expected (2; P = 5.9 x ) Trisomy 21 candidate genes

70 Conclusions: novel genes  We show the first evidence of de novo formation of unique human genes  Newly formed genes show simple ORFs  Regions/tissues with permissive expression environments may favor this process  SSD duplicated genes have a backup role for their duplicated copies in mouse and in fly.  WGD-duplicated genes do not have a backup role and have high essentiality, because of the enrichment of dosage-balanced genes.  The evolutionary profile of gene duplication and retention can suggest a role in human disease.

71 Acknowledgements David Gonzalez Knowles –Knowles & McLysaght, Genome Research (2009) Takashi Makino –Makino et al., TiG (2009) and Makino & McLysaght PNAS (2010)

72 Human-chimp sequence divergence Total (and nonsynonymous) base subsitutions pooled over all three genes 5 (2) 7 (3)


Download ppt "Origins and Evolution of Novel Genes Aoife McLysaght Trinity College Dublin."

Similar presentations


Ads by Google