Basic concepts in molecular evolution Gene: Sequence of DNA or RNA that has a potentially functional transcript: protein-coding, tRNA, rRNA, snRNA, miRNA, ... (sometimes referred to as "productive gene" serves as a regulatory or structural element: enhancer hub, centromeres, telomeres ... (sometimes refered to as "untranscribed gene" in literature) Pseudogene: A non-functional DNA element with high degree of similarity to a functional (typically productive) gene U.S. Dept of Energy Human Genome Program, http://www.ornl.gov/hgmis How would I know if a piece of DNA is functional or not?
Homologous genes: genes that share a common evolutionary origin 1. Orthologous genes - descendants of an ancestral gene that was present in the last common ancestor of two or more species ... so resulting from speciation event eg. a-globin in mouse & a-globin in human 2. Paralogous genes - arose by gene duplication within a lineage eg. a-globin in mouse & b-globin in mouse Memory aid: if similar genes are present in same genome, they must be paralogues But that does not necessarily mean that similar genes in different organisms are orthologues. eg. a-globin in mouse & b-globin in human are paralogues … because a-globin and b-globin genes arose by duplication (long ago in ancestor of mouse and human)
“Typical” eukaryotic protein-coding gene mRNA coding sequence Where is the promoter? 5’ UTR ? 3’ UTR ? What regions will be present in the mature mRNA? Is there an error in this figure? Fig.1.4
Fig.1.4 Cis-acting element: “RNA polymerase” Promoter “Splicing machinery” AUG UAA 5’ 3’ pre-mRNA eg. cis-element for RNA stability in 3’ UTR eg. RNA cis-element (5’ splice site) mRNA 5’ 3’ AUG UAA 5’ 3’ Regulatory small RNA (antisense) Cis-acting element: DNA (or RNA) sequences near a gene, that are important for its expression Trans-acting factor: protein (or RNA) that binds to cis-element to control gene expression Fig.1.4
“Typical” bacterial gene organization How many promoters are in the region shown in this figure? 2 How many proteins are encoded? 3 Operon = cluster of co-transcribed genes Evolutionary advantages of operon organization? - efficiency - co-ordination of gene expression - economy - less space in genome Fig.1.6
Typical prokaryotic gene: lacI in E. coli 10 20 30 40 50 60 70 80 90 ----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----| lacI CGAAGCGGCAUGCAUUUACGUUGACACCAUCGAAUGGCGCAAAACCUUUCGCGGUAUGGCAUGAUAGCGCCCGGAAGAGAGUCAAUUCAG 100 110 120 130 140 150 160 170 180 lacI GGUGGUGAAUGUGAAACCAGUAACGUUAUACGAUGUCGCAGAGUAUGCCGGUGUCUCUUAUCAGACCGUUUCCCGCGUGGUGAACCAGGC ...... 1250 1260 1270 1280 1290 1300 1310 1320 1330 lacI GCAGCUGGCACGACAGGUUUCCCGACUGGAAAGCGGGCAGUGAGCGCAACGCAAUUAAUGUGAGUUAGCUCACUCAUUAGGCACCCCAGG 1340 1350 1360 1370 1380 ----|----|----|----|----|----|----|----|----|----|--- lacI CUUUACACUUUAUGCUUCCGGCUCGUAUGUUGUGUGGAAUUGUGAGCGGAUAA -35 -10 stop codon of the upstream mhpR transcription start site SD start codon stop codon ssu rRNA --GAUCACCUCCUUA 3' mRNA --GUGGUGGGA---- 5' Frequent overlap between stop codon and start codon (of the downstream gene) ---UAAUG---, ---AUGA---
Gene family an inclusive set of functionally diverged paralogous genes (Gene duplication is typically followed by subfunctionalization, neofunctionalization or degradation/deletion). It may include pseudogenes. examples: Human immunoglobulin genes: IgA, IgD, IgE, IgG, IgI Alpha globin gene family Beta globin gene family Ubiquitin-specific protease gene family Xuhua Xia
Human HBA on Chr 16 1 2 1 Xuhua Xia
Human HBB family at Chr 11 Xuhua Xia
Human HBB family at Chr 11 G A Xuhua Xia
Protein-coding genes mRNA 5’ …. AUG GGA UUG CCC GCC …. 3’ 5’ …. ATG GGA TTG CCC GCC …. 3’ “coding strand” DNA 3’ .… TAC CCT AAC GGG CGG …. 5’ “template strand” mRNA 5’ …. AUG GGA UUG CCC GCC …. 3’ DNA usually shown as single-stranded with coding strand in 5’ to 3’ orientation … so genetic code table can be used directly
Transcription and Translation Gene 1 Gene 2 Gene 3 Polycistronic mRNA RNA polymerase Ribosome GCC~tRNAGly UCC~tRNAGly Protein UCC~tRNAGly Initiation: Met-Gly-... Elongation: Mn + M Mn+1 UCC~tRNAGly Xuhua Xia
Ribonucleotide concentration rATP 1890 rCTP 53 rGTP 190 rUTP 130 Measured in the exponentially proliferating chick embryo fibroblasts, 2hrs, in moles 10-12 per 106 cells. The difference is expected to be more extreme in mitochondria. NNA would seem to be a more efficient codon than NNC XIA, X., 1996. Genetics 144: 1309-1320. Xuhua Xia
Standard Genetic Code Codon families have 1 – 6 members Synonymous and nonsynonymous substitutions 0-fold, 2-fold, 3-fold, 4-fold degenerate sites 0-fold degenerate = non-degenerate 5’ …. AUG GGA UUG CCC CAC …. 3’ Xuhua Xia
Standard code 43 = 64 possible codons Codon families have 1 – 6 members Initiation codon When translating a nt sequence, always be sure to read it in the 5’ to 3’ direction !! 5’ …. AUG GGA UUG CCC CAC …. 3’ N-terminus… Met Gly Leu Pro His … C-terminus
Genetic code is not “universal” Some mitochondria, a few bacteria, a few protists use a non-standard code Table 1.4 Vertebrate mitochondrial code UGA = Trp (instead of stop codon) AUA, AUG = Met AGA, AGG = stop codons Possible implications of different codes in nature? “Defense” against foreign DNA invading the genome? “standard”genetic code
AMINO ACIDS – Venn diagram showing properties acidic: Asp = GAU, GAC Glu = GAA, GAG Basic: His, Lys, Arg Fig. 1.9
Why study amino acid properties? Protein properties often depends on the properties of their amino acids: Effect of mutation Diagnosis, e.g., protein electrophoresis Normal polypeptide (Hb-A): Val-His-Leu-Thr-Pro-Glu-Glu…… GAA Sickel-cell polypeptide (Hb-S): Val-His-Leu-Thr-Pro-Val-Glu…… GUA
Amino acid substitutions: (polarity, molecular volume...) Grantham’s distance: F(V, P, C) Miyata’s distance: F(V, P) Amino acid substitutions: Conservative Ile Leu Radical Table 4.7 Cys Trp
Amino acid substitution matrices 10 20 30 40 50 60 ----|----|----|----|----|----|----|----|----|----|----|----|-- S1 RWFFSTNHKDIGTLYLVFGAWAGMVGTALSLLIRAELSQPGALLGDDQIYNVIVTAHAFVMI S2 RWLFSTNHKDIGTLYLLFGAWAGVLGTALSLLIRAELGQPGNLLGNDHIYNVIVTAHAFVMI BLOSUM = BLOcks Substitution Matrix a substitution matrix used for sequence alignment of proteins (to score alignments between evolutionarily divergent protein sequences). Xuhua Xia
- based on observed frequencies of amino acids BLOSUM62 matrix - based on observed frequencies of amino acids replacing other amino acids during protein evolution, particularly within conserved regions Positive value for chemically similar substitutions Leu to Ile = 2 Negative value for dissimilar Cys to Trp = - 2 Large value for rare amino acids, & no change (diagonal) usually correlated with important function Cys unchanged = 9 www.doc.ic.ac.uk/
For the 61 sense codons, how many substitution mutations are possible? 43 = 64 possible codons 3 are stop codons UAA, UAG, UGA For the 61 sense codons, how many substitution mutations are possible? Change of 1st position of codon to each of 3 other nucleotides… 2nd position of codon… 3rd position of codon… ( ) x 61 3 + 3 + 3 = 549 How many lead to amino acid changes ? (Table 1.5) (i.e. non-synonymous substitutions )
change in 2d position of codon always alters the amino acid encoded Synonymous Nonsynonymous 1st position Note: This table summarizes outcome of nt sub at each site within codon, not the frequency of change seen in nature 2nd position change in 2d position of codon always alters the amino acid encoded 3rd position 3rd position change often is “silent” (encoding same aa)
… but in nature, nt subs do not all occur with equal frequency: From Table 1.3, can see that most types of nt substitutions result in amino acid changes… … but in nature, nt subs do not all occur with equal frequency: & synonymous subs occur much more frequently than non-synonymous ones - that’s expected because most amino acids changes would be detrimental… syn subs usually >> non-syn subs Also some amino acids are more common than others in proteins eg. Cys typically rare (but often has important function in protein folding) For protein Y of 200 amino acids, approximately how many non-synonymous sites are expected in its mRNA?