Formation of novel protein-coding genes Level 3 Molecular Evolution and Bioinformatics Jim Provan Patthy Chapter 6.

Slides:



Advertisements
Similar presentations
Evolution of genomes.
Advertisements

Genomics – The Language of DNA Honors Genetics 2006.
4: Genome evolution. Types of Genomic Duplications Part of an exon or the entire exon is duplicated Complete gene duplication Partial chromosome duplication.
Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
Modular proteins I Level 3 Molecular Evolution and Bioinformatics Jim Provan Patthy Sections –
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E The Stability of the Genome Duplication, Deletion, Transposition.
GENE DUPLICATIONS A.Non-homologous recombination B.Transposition C.Non-disjunction in meiosis.
BIOE 109 Summer 2009 Lecture 4- Part I Mutation and genetic variation.
Protein Modules An Introduction to Bioinformatics.
Chapter 19: Eukaryotic Genomes Most gene expression regulated through transcription/chromatin structure Most gene expression regulated through transcription/chromatin.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
Eukaryotic Gene Control. Developmental pathways of multicellular organisms: All cells of a multicellular organism start with the same complement of DNA.
Models of Molecular Evolution I Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.1 – 7.2.
RNA and Protein Synthesis
EVOLUTION OF NEW GENES How do complex organisms acquire extra genes (for new functions)? … and extra forms of regulation? 1. Gene duplication - one copy.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
The Biology and Genetic Base of Cancer. 2 (Mutation)
Copyright © 2002 Pearson Education, Inc., publishing as Benjamin Cummings Section B: Genome Organization at the DNA Level 1.Repetitive DNA and other noncoding.
Questions. 09_12_Mutation.jpg Gene Evolution Pages
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Genomes & their evolution Ch 21.4,5. About 1.2% of the human genome is protein coding exons. In 9/2012, in papers in Nature, the ENCODE group has produced.
Chapter 21 Eukaryotic Genome Sequences
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Today… Genome 351, 12 April 2013, Lecture 4 mRNA splicing Promoter recognition Transcriptional regulation Mitosis: how the genetic material is partitioned.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Molecular and Genomic Evolution Getting at the Gene Pool.
Table 8.3 & Alberts Fig.1.38 EVOLUTION OF GENOMES C-value paradox: - in certain cases, lack of correlation between morphological complexity and genome.
Copyright © 2008 Pearson Education, Inc., publishing as Pearson Benjamin Cummings PowerPoint ® Lecture Presentations for Biology Eighth Edition Neil Campbell.
Eukaryotic Gene Expression
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
Evolution at the Molecular Level
Key Area 1.6 (a) and (b) Gene Mutations. Learning Outcomes.
Chapter 3 The Interrupted Gene.
What you need to know: The major goals of the Human Genome Project How prokaryotic genomes compare to eukaryotic genomes. The activity and role of transposable.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Concept 21.1: New approaches have accelerated the pace of genome sequencing The most ambitious mapping project to date has been the sequencing of the human.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
LECTURE PRESENTATIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry, Michael L. Cain, Steven A. Wasserman, Peter V. Minorsky, Robert.
Gene structure and function
Aim: How is DNA organized in a eukaryotic cell?. Why is the control of gene expression more complex in eukaryotes than prokaryotes ? Eukaryotes have:
Objective: I can explain how genes jumping between chromosomes can lead to evolution. Chapter 21; Sections ; Pgs Genomes: Connecting.
Primary Mechanism of Duplication : Unequal Crossing Over Crossing over Between Daughter Strands Addition (duplication) Deletion (tandom duplications)
Genomes and their evolution
Evolution of eukaryotic genomes
Chapter 7 Clusters and Repeats Jocelyn E. Krebs.
Evolution of gene function
Genetics and Evolutionary Biology
Genomes and their evolution
Chromosome-level Mutation
Very important to know the difference between the trees!
Genomes and Their Evolution
Genomes and Their Evolution
SGN23 The Organization of the Human Genome
Genomes and Their Evolution
Genomes and their evolution
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
Gene duplications: evolutionary role
Genomes and Their Evolution
Genomes and Their Evolution
Gene Density and Noncoding DNA
Genomes and Their Evolution
Genomes and Their Evolution
Chapter 6 Clusters and Repeats.
1. Unequal Crossing-Over a. process: If homologs line up askew:
Genomes and Their Evolution
Genomes and Their Evolution
Genomes and Their Evolution
Presentation transcript:

Formation of novel protein-coding genes Level 3 Molecular Evolution and Bioinformatics Jim Provan Patthy Chapter 6

De-novo formation of novel protein- coding genes Creation of simple structural elements such as  - helices,  -sheets and reverse turns seems to be rather trivial: So many alternative ways of forming these structures Have been “invented” independently several times Proteins with repetitive structure are most likely to arise de novo: Repetitive oligonucleotide sequences can expand, forming periodic protein structures: — Collagen-like — Leucine-rich repeat (LRR) Probably arose several times during evolution

Evolution of serum antifreeze glycoproteins Fish that live in polar waters have serum antifreeze glycoproteins (AFGPs) which allow them to tolerate temperatures of as low as –1.9°C It has been shown that fish from the north and south poles have evolved very similar AFGPs independently: AFGP of Antarctic fish, made up of a simple tripeptide repeat: evolved by recruitment of the 5’ and 3’ ends of an ancestral trypsinogen gene (secretory signal and 3’ UTR) and de novo amplification of a 9bp Thr-Ala-Ala motif Arctic cod also have a Thr-Ala-Ala tripeptide repeat-based AFGP but this has no relationship with the trypsinogen gene Threonines are O-linked to galactosyl-N-acetylgalactosamine and periodicity of repeats matches periodicity of water molecules Convergent evolution of the tripeptide-based AFGP

De novo creation of complex proteins Probability of de novo creation of more complex, globular proteins is inversely proportional to complexity: Those that consist of a single supersecondary structure element (TIM barrel proteins) have higher probability of independent creation TIM barrel structure likely to have evolved several times Easier to remodel replicas of old protein folds than to invent them from scratch: Creation of first folded proteins was probably the rate-limiting step in protein-based life All extant proteins probably arose from a limited number of ancestral folds through divergence

Evolutionary convergence Previous examples highlight convergence to similar primary, secondary or tertiary structure (structural convergence) Unlike structural convergence, functional convergence and mechanistic convergence are relatively common: Several types of proteinases that have similar function (i.e. they cleave proteins) but have different structures and catalytic mechanisms and have evolved independently Example of mechanistic convergence is the serine proteases of the subtilisin and trypsin families: — Similar active sites and catalytic mechanisms but no sequence or conformational homology — Catalytic triad residues (His, Asp, Ser) occur in different order in primary structures Difficult to prove structural convergence

Gene duplications Evolutionary significance of gene duplication is that it gives rise to a redundant duplication of a gene Duplicated gene may acquire divergent mutations and eventually emerge as a new gene Gene duplication is the predominant and most important mechanism by which new genes arise: Genes derived by a duplication event are said to be paralogous and are found in different loci of the chromosome Different from orthologous genes gained by speciation events, which are found in different loci of the corresponding species

Types of DNA duplications An increase in the number of copies of a DNA segment can be brought about by several types of DNA duplication: Partial, intragenic or internal gene duplication – only an internal segment of a protein-coding gene is duplicated Complete gene duplication, including flanking regions necessary for expression Partial chromosome duplication – several adjacent genes are duplicated Chromosomal duplication (aneuploidy) Genome duplication (polyploidy)

Mechanisms of gene duplication Major mechanisms for short intragenic duplications is disengagement of the DNA polymerase from the strand that is being copied and reattachment at the wrong point (slipped strand mispairing) Major mechanism for larger duplications involves unequal crossing over: Involves mistaken pairing and recombination between homologous chromosomes Most likely in already-duplicated regions: — Allows rapid expansion of repeats within genes and expansion of gene families — May facilitate “homogenisation” of gene sequences and thus slow down divergence (concerted evolution)

Unequal crossing-over   Anti-Lepore  Lepore

Gene duplications in lysozyme In ruminants, lysozyme gene has been duplicated ~10 times and is expressed less in extra-intestinal tissues In mice, intestinal lysozyme is expressed from lysP gene, whereas in other tissues it is encoded by the lysM gene Original gene duplication through unequal crossing- over in Alu-like B2 middle repetitive elements lys lysPlysM

Retrosequences Copies of protein-coding genes may be produced by duplicative transposition: DNA is transcribed into RNA, which is reverse-transcribed into a cDNA (retroposition) During re-insertion, small segments of host DNA (4-12bp) are duplicated, forming direct repeats Significant diagnostic features of retrosequences: Lack introns (where parent gene would have introns) Lack upstream promoter elements of parent gene Contain poly(A) stretches at 3’ end Flanked by short, direct repeats Different chromosomal location from original gene

Functionality of retrosequences Depending on whether the copied gene is functional or not, we can distinguish processed genes (retrogenes) and processed pseudogenes (retropseudogenes): Several reasons why functional retrogenes are unlikely: — Process of reverse-transcription is very inaccurate — Lacks necessary regulatory elements — Generally truncated at 5’ end (reverse transcriptase failure) — May be inserted in genomic region unsuitable for expression More likely to form retropseudogenes Some examples of processed functional genes have been found e.g. human phosphoglycerate kinase: X-linked gene has 11 exons and 10 introns Autosomal PGK gene has no introns and a poly(A) tail

Alu elements Processed pseudogenes of the RNA gene specifying 7SL RNA which cuts signal sequences of secreted proteins: About 300 bp long Around 500,000 copies in the human genome (5-6%) Named after characteristic AluI restriction site Derived from functional 7SL sequence by duplication, two deletions and many mutations Play a key role in genome plasticity since they facilitate unequal crossing-over: Gene duplication Exon shuffling

Fate of duplicated genes Determined by functional consequences of having extra copies of same gene and increased amounts of protein Duplications can be advantageous, deleterious or neutral: If an organism is exposed to a toxic environment, there may be an advantage in overproduction of detoxifying enzymes Disadvantage will result of overproduction of protein upsets regulatory balance Most duplications are neutral – fate determined by selection and drift Duplicated gene is unlikely to be fixed unless it acquires a novel and useful function: May specialise in different subfunctions of ancestral gene May acquire drastically different functions (hepatocyte growth factor vs. plasminogen)

Formation of gene families Recently duplicated gene families are generally found in close proximity on the same chromosome Some multigene families contain invariant repeated genes: Common when large quantities of protein product are required Histones have to be synthesised at a high rate during a well- defined, short period of cell division Some members of multigene families serve the same function but differ in tissue specificity, developmental regulation or biochemical properties e.g. isozymes

Concerted evolution in multigene families Paralogous members of multigene families are very similar to each other within one species although orthologous members of the same family may differ greatly between even closely related species: Suggests that mechanisms exist which cause gene families to evolve together as a unit (concerted evolution) Process of concerted evolution of multigene families under the effects of random genetic drift is known as molecular drive Gene correction mechanisms may homogenise genes – difficult to trace true evolutionary history of many multigene families

Dating gene duplications Assuming duplicated genes diverge at a constant rate, we can estimate the date of a gene duplication, T D, that gave rise to two paralogous genes (A and B) if we have sequences of these paralogues from two different species (1 and 2) and we know the time of speciation T S : If genes evolved at a constant rate then: — Average number of substitutions per site ([K A + K B ]/2) in the two orthologue comparisons (A1 vs. A2, B1 vs. B2) is proportional to T S — Average number of substitutions per site K AB in the four paralogous comparisons (A1 vs. B1, A2 vs. B2, A1 vs. B2, A2 vs. B1) is proportional to the time since duplication T D Thus, the following equation holds: T D /T S = 2K AB /[K A + K B ] T D /T S = 2K AB /[K A + K B ]

Dating gene duplications (continued) All vertebrates have both myoglobin and haemoglobin Myoglobin differs from both the  and  subunits of haemoglobin more than they differ from each other: Myoglobin diverged (T D = mya) before the  and  genes arose (T D = 500 mya) Mammals, reptiles, birds, amphibians and bony fish all have distinct  and  subunits, whereas the most primitive vertebrates, the Agnatha (jawless fish), contain only one type of haemoglobin subunit: Myoglobin and haemoglobin diverged prior to the separation of agnathans and jawed vertebrates Duplication giving rise to  and  subunits occurred in the ancestor of all jawed vertebrates following its divergence from agnathans

Evolutionary history and linkage patterns in  - and  -globin clusters In humans, gene cluster of the  -globin family (  16) consists of four functional genes ( ,  1,  2,  1) and three unprocessed pseudogenes ( ,  1,  2): Embryonic type  is most divergent (estimated T D > 300 mya)  1 is less divergent (estimated T D ~ 260 mya) Genes  1 and  2 produce identical polypeptide and have near- identical nucleotide sequence, suggesting recent divergence  -globin family (  11) contains five functional genes and  : Adult types (  and  ) diverged from non-adult types (G   A  and  ) around mya Ancestor of both  genes diverged from  about mya Duplication that formed G  and  A  occurred after separation of human lineage from New World monkeys (35 mya) Divergence of adult genes (  and  ) occurred about 80 mya

Intergene distance and time since duplication 20 Myr 40 Myr 100 Myr 200 Myr  GGGG AAAA 10 kb Age (Myr) Distance (kb)