Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome organization and evolution. What is in a gene? Structural gene UTRs Promoters Upstream regulatory elements Introns Transcriptional units Alternative.

Similar presentations


Presentation on theme: "Genome organization and evolution. What is in a gene? Structural gene UTRs Promoters Upstream regulatory elements Introns Transcriptional units Alternative."— Presentation transcript:

1 Genome organization and evolution

2 What is in a gene? Structural gene UTRs Promoters Upstream regulatory elements Introns Transcriptional units Alternative splicing

3 Prokaryotic Genes Come in Operons polycistronic mRNA ABCDE ABCDE control regiongenes A-E start siteoperon

4 Eukaryotic Genes mRNA control regions exon 1exon 2exon 3 AB tens of kb’s

5 One Gene – Many Products... exon 1exon 2exon 3exon 4 mRNA 1 mRNA 2 Transcriptional unit

6 Alternative Splicing – How Common? Preliminary estimates: 35% of human genes display alternative splicing at 5’ end Mironov, Genome Res 1999 Human Genome: ~60% of genes display alternative splicing International Human Genome Sequencing Consortium, Nature 2001

7 What is in a genome? Bacterial and archaeal genomes Yeast genome Animal and plant genomes

8 Bacterial and archaeal genomes Genomes Majority– large single circular double stranded DNA, < 5Mbp long May contain additional plasmids Contain little non-coding DNA (11% in E.coli) Genes No introns Partially organized into operons Bacteria – operons contain genes of related function Archaea – usually mo metabolic relation between genes in operons

9 The genome of Escherichia coli E.coli genome inventory 4285 protein-coding genes 122 structural RNA genes non-coding repeat sequences regulatory elements transcription/translation guides transposases prophage remnants insertion sequence elements patches of unusual composition, likely to be foreign elements introduced by horizontal transfer. Number% Transport and binding proteins 2816.55 Putative enzymes 2515.85 Energy metabolism 2435.67 Cell processes (including adaptation, protection) 1884.38 Central intermediary metabolism 1884.38 Cell structure 1824.24 Translation, post-translational protein modification 1824.24 Putative transport proteins 1463.4 Putative regulatory proteins 1333.1 Amino acid biosynthesis and metabolism 1313.06 Carbon compound catabolism 1303.03 DNA replication, recombination, modification, and repair 1152.68 Biosynthesis of cofactors, prosthetic groups, and carriers 1032.4 Phage, transposons, plasmids 872.03 Nucleotide biosynthesis and metabolism 581.35 Transcription, RNA synthesis, metabolism, and modification 551.28 Fatty acid and phospholipid metabolism 481.12 Regulatory function 451.05 Putative structural proteins 420.98 Other known genes (gene product or phenotype known) 260.61 Putative membrane proteins 130.3 Putative chaperones 90.21 Hypothetical, unclassified, unknown 163238.06

10 The genome of the archaeon Methanococcus jannaschi Thermophilic (85°C) anaerobe, collected from a hydrothermal vent 2600 m under the sea. Overall metabolic equation: 4 H 2 + CO 2 -> CH 4 + 2 H 2 0 One circular chromosome of 1,664,976 bp + two extrachromosomal elements (58,407 and 16,550 bp) Majority of genes more similar to eukaryotic ones Genes involved in transcription, translation and regulation Metabolic genes more similar to bacterial ones

11 The genome of the bacteruim Mycoplasma genitalium: one of the smallest genomes 580,070 bp Difference from free-living bacteria: Has: genes specific for infectious activity Does not have: many metabolic enzymes

12 Inventory of a eukaryotic genome Moderately repetitive DNA Functional dispersed gene families e.g. actin, globin tandem gene family arrays rRNA genes (250 copies) tRNA genes (50 sites with 10–100 copies each in human) histone genes in many species Without known function short interspersed elements (SINEs) Alu is an example (some function in gene regulation) 200–300 bp long 100 000's of copies (300 000 Alu) scattered locations (not in tandem repeats) long interspersed elements (LINEs) 1–5 kb long 10–10 000 copies per genome pseudogenes Highly repetitive DNA minisatellites composed of repeats of 14–500 bp segments 1–5 kb long many different ones scattered throughout the genome microsatellites composed of repeats of up to 13 bp ~100s of kb long ~ 10 6 copies/genome most of the heterochromatin around the centromere telomeres contain a short repeat unit (typically 6 bp: TTAGGG in human genome, TTGGGG in Paramecium, TAGGG in trypanosomes, TTTAGGG in Arabidopsis) 250–1 000 repeats at the ends of each chromosome

13 Distribution of exon length Source: Campbell/Heyer. Discovering genomics, proteomics, and bioinformatics

14 Distribution of intron length Source: Campbell/Heyer. Discovering genomics, proteomics, and bioinformatics

15 Functional distribution of proteins Source: Campbell/Heyer. Discovering genomics, proteomics, and bioinformatics

16 The genome of Saccharomyces cerevisiae 12,057,500 bp in 16 chromosomes 5885 protein-coding genes 3480 known proteins ~1000 with some similarity to proteins in other species ~800 similar to unknown ORFs in other species ~140 rRNA genes 40 snRNA genes 275 tRNA genes Introns rare (only 231 genes) and small Few repeat sequences

17 The genome of Caenorhabditis elegans 959 somatic cells 300 neurons Genome: 97 Mbp in 6 chromosomes 27% covered by exons >19 000 genes avg. 5 exons/gene Type of domainNumber Seven-transmembrane spanning chemoreceptor650 Eukaryotic protein kinase domain410 Two domain, C4 type zinc finger240 Collagen170 Seven-transmembrane spanning receptor (rhodopsin-family)140 C2H2 type zinc finger130 C-type lectin120 RNA recognition motif100 C3HC4 type (RING finger) zinc fingers90 Protein tyrosine phosphatase90 Ankyrin repeat90 WD domain G-beta repeat90 Homeobox domain80 Neurotransmitter gated ion channel80 Cytochrome P45080 Conserved C-terminal helicase80 Short chain and alcohol dehydrogenases80 UDP-glucoronosyl and UDP-glucosyl transferases70 EGF-like domain70 Immunoglobulin superfamily70

18 The human genome Basic facts Protein coding genes Repeat sequences RNA

19 Some facts about the human genome 3.2×10 9 bp Genes comprise about 3% of the genome Average gene length: ~ 8,000 bp Average of 5-6 exons/gene Average exon length: ~200 bp Average intron length: ~2,000 bp ~8% genes have a single exon Extremes: Factor VIII gene (whose mutations cause hemophilia A) spread over ~186,000 bp (~9 kb of exons and ~177 kb of introns.) 26 exons (size range 69 to 3,106 bp) 25 introns (size range 207 to 32,400 bp) Dystrophin - the biggest human gene yet >30 exons, spread over 2.4 million bp. Dystrophin is a large, rod-like cytoskeletal protein which is found at the inner surface of muscle fibers. Dystrophin is part of the dystrophin- glycoprotein complex (DGC), which bridges the inner cytoskeleton (F-actin) and the extracellular matrix.

20 Protein-coding genes Functional classification FunctionNumber% Nucleic acid binding220714 DNA binding165610.5 DNA repair protein450.2 DNA replication factor70 Transcription factor9866.2 RNA binding3802.4 Structural protein of ribosome1370.8 Translation factor440.2 Transcription factor binding60 Cell cycle regulator750.4 Chaperone1540.9 Motor850.5 Actin binding1290.8 Defense/immunity protein6033.8 Enzyme324220.6 Peptidase4572.9 Endopeptidase4032.5 Protein kinase8395.3 Protein phosphatase2951.8 Enzyme activator30 Enzyme inhibitor1320.8 FunctionNumber% Enzyme activator30 Enzyme inhibitor1320.8 Apoptosis inhibitor280.1 Signal transduction179011.4 Receptor13188.4 Transmembrane receptor12027.6 G-protein linked receptor4893.1 Olfactory receptor710.4 Storage protein70 Cell adhesion1891.2 Structural protein7144.5 Cytoskeletal structural protein1450.9 Transporter6824.3 Ion channel2691.7 Neurotransmitter transporter190.1 Ligand binding or carrier15369.7 Electron transfer330.2 Cytochrome P450500.3 Tumour suppressor50 Unclassified481330.6 Total15683100

21 Protein-coding genes Structural classification, most common types of proteins ProteinNumber Immunoglobulin and MHC domain591 Zinc finger, C2H2 type499 Eukaryotic protein kinase459 Rhodopsin-like GPCR superfamily346 Ser/Thr protein kinase family active site285 EGF-like domain259 RNA-binding region RNP-1214 G-protein beta WD-40 repeats196 Src homology 3 (SH3) domain194 Pleckstrin homology (PH) domain188 EF-hand family185 Homeobox domain179 Tyrosine kinase catalytic domain173 Immunoglobulin V-type163 RING finger159 ProteinNumber Proline rich extension156 Fibronectin type III domain151 Ankyrin-repeat135 KRAB box133 Immunoglobulin subtype128 Cadherin domain118 PDZ domain (a.k.a DHR or GLGF)117 Leucine-rich repeat113 Serine proteases, trypsin family108 Ras GTPase superfamily103 Src homology 2 (SH2) domain100 BTB/POZ domain99 TPR repeat92 AAA ATPase superfamily92 Asp/Asn hydroxylation site91

22 Homology of human proteins to other forms of life

23 Evolution of genomes Protein evolution Adaptive mutations Chromosome rearrangements

24 Mode of protein evolution De novo creation Gene fusion / fission Gene duplication Rapid sequence change Pseudogenisation 

25 Regions of human:mouse synteny Human Mouse


Download ppt "Genome organization and evolution. What is in a gene? Structural gene UTRs Promoters Upstream regulatory elements Introns Transcriptional units Alternative."

Similar presentations


Ads by Google