Presentation is loading. Please wait.

Presentation is loading. Please wait.

Eukaryotic Genomes: Fungi Wednesday, October 22, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner

Similar presentations


Presentation on theme: "Eukaryotic Genomes: Fungi Wednesday, October 22, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner"— Presentation transcript:

1 Eukaryotic Genomes: Fungi Wednesday, October 22, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

2 Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by J Pevsner (ISBN 0-471-21004-8). Copyright © 2003 by Wiley. These images and materials may not be used without permission from the publisher. Visit http://www.bioinfbook.org Copyright notice

3 We are in the last third of the course: Today: Fungi. Exam #2 is due at the start of class. Next Monday: Functional genomics (Jef Boeke) Next Wednesday: Pathways (Joel Bader) Monday Nov. 3: Eukaryotic genomes Wednesday Nov. 5: Human genome Monday Nov. 10: Human disease Wednesday Nov. 12: Final exam (in class) Announcements

4 Outline of today’s lecture Description and classification of fungi The Saccharomyces cerevisiae genome Duplication of the yeast genome Functional genomics in yeast Comparative genomics of fungi

5 Introduction to fungi: phylogeny Fungi are eukaryotic organisms that can be filamentous (e.g. molds) or unicellular (e.g. the yeast Saccharomyces cerevisiae). Most fungi are aerobic (but S. cerevisiae can grow anaerobically). Fungi have major roles in the ecosystem in degrading organic waste. They have important roles in fermentation, including the manufacture of steroids and penicillin. Several hundred fungal species are known to cause disease in humans.

6 Eukaryotes (Baldauf et al., 2000)

7 Fungi and metazoa are sister groups Fig. 15.1 Page 504 Baldauf et al., 2000

8 Classification of fungi About 70,000 fungal species have been described (as of 1995), but 1.5 million species may exist. Four phyla: Ascomycotayeasts, truffles, lichens Basidiomycotarusts, smuts, mushrooms ChytridiomycotaAllomyces Zygomycotafeed on decaying vegetation Box 15-1 Page 505

9 Classification of fungi About 70,000 fungal species have been described (as of 1995), but 1.5 million species may exist. Four phyla: Ascomycotayeasts, truffles, lichens HemiascomycetaeGénolevure project EuascomycetaeNeurospora Loculoascomycetae Laboulbeniomycetaeparasites of insects Basidiomycotarusts, smuts, mushrooms ChytridiomycotaAllomyces Zygomycotafeed on decaying vegetation Box 15-1 Page 505

10

11

12 Introduction to Saccharomyces cerevisiae First species domesticated by humans Called baker’s yeast (or brewer’s yeast) Ferments glucose to ethanol and carbon dioxide Model organism for studies of biochemistry, genetics, molecular and cell biology …rapid growth rate …easy to modify genetically …features typical of eukaryotes …relatively simple (unicellular) …relatively small genome Page 505

13 Sequencing the S. cerevisiae genome The genome was sequenced by a highly cooperative consortium in the early 1990s, chromosome by chromosome (the whole genome shotgun approach was not used). This involved 600 researchers in > 100 laboratories. --Physical map created for all XVI chromosomes --Library of 10 kb inserts constructed in phage --The inserts were assembled into contigs The sequence released in 1996, and published in 1997 (Goffeau et al., 1996; Mewes et al., 1997) Page 505

14 Features of the S. cerevisiae genome Sequenced length:12,068 kb = 12,068,000 base pairs Length of repeats:1,321 kb Total length:13,389 kb (~ 13 Mb) Open reading frames (ORFs):6,275 Questionable ORFs (qORFs): 390 Hypothetical proteins:5,885 Introns in ORFs:220 Introns in UTRs:15 Intact Ty elements: 52 tRNA genes:275 snRNA genes:40 Page 506

15 Features of the S. cerevisiae genome A notable feature of the genome is its high gene density (about one gene every 2 kilobases). Most bacteria have about one gene per kb, but most eukaryotes have a much sparser gene density. Also, only 4% of S. cerevisiae genes are interrupted by introns. By contrast, 40% of Schizosaccharomyces pombe genes have introns. What are the most common protein families and protein domains? You can see the answer at EBI’s website: http://www.ebi.ac.uk/proteome/ Page 506

16 Fig. 15.2 Page 508

17 Page 506

18 Fig. 15.3 Page 509 http://www.ebi.ac.uk/proteome/ The EBI website offers a variety of proteome analysis tools, such as this summary of protein length distribution in S. cerevisiae.

19 ORFs in the S. cerevisiae genome How are ORFs defined? In the initial genome analysis, an ORF was defined as >100 codons (thus specifying a protein of ~11 kilodaltons). 390 ORFs were listed as “questionable”, because they were considered unlikely to be authentic genes. For example, they were short, or exhibited unlikely preferences for codon usage. How many ORFs are there in the yeast genome? There are 40,000 ORFs > 20 amino acids; how many of these are authentic? Page 506-507

20 ORFs in the S. cerevisiae genome Several criteria may be applied to decide if ORFs are authentic protein-coding genes: [1] evidence of conservation in other organisms [2] experimental evidence of gene expression (microarrays, SAGE, functional genomics) The groups of Elizabeth Winzeler and Michael Snyder each recently described hundreds of previously unannotated genes that are transcribed and translated. Page 507

21 ORFs in the S. cerevisiae genome The MIPS Comprehensive Yeast Genome Database lists criteria for assigning ORFs, based on FASTA search scores: Number Categoryof proteins Known protein3400 Strong similarity to known protein230 Similarity or weak similarity to known protein825 Similarity to unknown protein1007 No similarity516 Questionable ORF472 Total6450 Page 507, 510

22 Exploring a typical S. cerevisiae chromosome We will next familiarize ourselves with the S. cerevisiae genome by exploring a typical chromosome, XII. Page 508

23 Exploring a typical S. cerevisiae chromosome We will next familiarize ourselves with the S. cerevisiae genome by exploring a typical chromosome, XII. This chromosome features 38% GC content very little repetitive DNA few introns six Ty elements (transposable elements) a high ORF density: 534 ORFs > 100aa, and 72% of the chromosome has protein-coding genes Page 508-511

24 Key S. cerevisiae databases Web resources include: NCBI (Entrez  Genome  Eukaryotic genome projects) EBI http://www.ebi.ac.uk/proteome/ SGD: Saccharomyces Genome Database http://genome-www.stanford.edu/Saccharomyces/ MIPS Comprehensive Yeast Genome Database (MIPS = Munich Information Center for Protein Sequences) http://mips.gsf.de/proj/yeast/CYGD/db/ Page 508

25

26 NCBI: Entrez genomes for yeast resources Fig. 15.4 Page 510

27 NCBI: Entrez genomes for yeast resources ~Fig. 15.5 Page 511

28 NCBI: Entrez genomes for yeast resources ~Fig. 15.5 Page 511

29 Fig. 15.6 Page 512 MIPS offers a Comprehensive Yeast Genome Database http://mips.gsf.de/genre/proj/yeast/index.jsp

30 Fig. 15.7 Page 513 http://www.yeastgenome.org/ Saccharomyces Genome Database (SGD)

31 Fig. 15.7 Page 513

32 S. cerevisiae gene nomenclature YKL159c Y = yeast K = 11 th chromosome L = left (or right) arm 159 = 159 th ORF c = Crick (bottom) or w (Watson, top) strand Box 15-2 Page 514

33 S. cerevisiae gene nomenclature YKL159c Y = yeast K = 11 th chromosome L = left (or right) arm 159 = 159 th ORF c = Crick (bottom) or w (Watson, top) strand RCN1 = wildtype gene Rcn1p = protein rcn1 = mutant allele Box 15-2 Page 514

34 Duplication of the S. cerevisiae genome Analysis of the S. cerevisiae genome revealed that many regions are duplicated, both intrachromosomally and interchromosomally (within and between chromosomes). These duplicated regions include both genes and nongenic regions. Such duplications reflect a fundamental aspect of genome evolution. What are the mechanisms by which regions of the genome duplicate? Page 511

35 Duplication of the S. cerevisiae genome Mechanisms of gene duplication tandem repeat slippage during recombination Gene conversion Lateral gene transfer Segmental duplication polyploidy e.g. genome tetraploidy Fig. 15.8 Page 514

36 Duplication of the S. cerevisiae genome Fate of duplicated genes Both copies persist One copy is deleted One copy becomes a pseudogene One copy functionally diverges Fig. 15.8 Page 514

37 Duplication of the S. cerevisiae genome In 1970, Susumu Ohno published the book Evolution by Gene Duplication. He hypothesized that vertebrate genomes evolved by two rounds of whole genome duplication. This provided genomes with the “raw materials” (new genes) with which to introduce various innovations. Page 512

38 Duplication of the S. cerevisiae genome Ohno (1970): “Had evolution been entirely dependent upon natural selection, from a bacterium only numerous forms of bacteria would have emerged. The creation of metazoans, vertebrates, and finally mammals from unicellular organisms would have been quite impossible, for such big leaps in evolution required the creation of new gene loci with previously nonexistent function. Only the cistron that became redundant was able to escape from the relentless pressure of natural selection. By escaping, it accumulated formerly forbidden mutations to emerge as a new gene locus.” Page 512

39 Duplication of the S. cerevisiae genome Wolfe and Shields (1997, Nature) provided support for Ohno’s paradigm. They hypothesized that the yeast genome duplicated about 100 million years ago. There was a diploid yeast genome with about 5,000 genes. It doubled to a tetraploid number of 10,000 genes. Then there was massive gene loss and chromosomal rearrangement to yield the present day 6,000 genes. Page 515

40 Fig. 15.9 Page 515 Distance along chromosome X (kb) Distance along chromosome XI (kb) Wolfe and Shields (1997) performed blastp and found 55 blocks of duplicated regions. They proposed that the entire S. cerevisiae genome underwent a duplication. Matches with scores >200 are shown. These are arranged in blocks of genes.

41 Duplication of the S. cerevisiae genome Evidence of genome duplication in yeast -- Systematic BLAST searches show 55 blocks of duplicated sequences. -- There are 376 pairs of homologous genes. You can see the results of chromosomal comparisons on Ken Wolfe’s web site and at the SGD web site. Page 515

42 Fig. 15.10 Page 516 The SGD website includes a pairwise chromosome similarity viewer.

43 Kenneth Wolfe offers a website that permits analysis of yeast duplications: http://oscar.gen.tcd.ie/~khwolfe/yeast/ Page 516

44

45 As an example, note the SSO1 gene on XVI

46 SSO1 (XVI) & SSO2 (XVIII) are part of a block

47 Duplication of the S. cerevisiae genome Two models for the presence of duplication blocks [1] Whole genome duplication (tetraploidy) followed by gene loss and rearrangements [2] Successive, independent duplication events Page 516

48 Duplication of the S. cerevisiae genome Model [1] is favored for several reasons: -- For 50 of 55 duplicated regions, the orientation of the entire block is preserved with respect to the centromere. The orientation is not random. -- For model [2] we would expect 7 triplicated regions. We observe only 0 or 1. -- Gene order is maintained in 14 hemiascomycetes (the Génolevures project) Page 516

49 Duplication of the S. cerevisiae genome The Génolevures project: -- Partial sequencing of 13 hemiascomycetes -- Gene order can be compared in 14 fungi -- 70% of the S. cerevisiae genome maps to sister regions with only minimal overlap -- Proposal that the 16 centromeres form 8 pairs Page 517

50 Duplication of the S. cerevisiae genome The Génolevures project: -- Partial sequencing of 13 hemiascomycetes -- Gene order can be compared in 14 fungi -- 70% of the S. cerevisiae genome maps to sister regions with only minimal overlap -- Proposal that the 16 centromeres form 8 pairs Phylogenetic analyses place the divergence of S. cerevisiae and Kluyveromyces lactis prior to the whole genome duplication (~100 million years ago). Perhaps the genome duplication enabled S. cerevisiae to acquire new properties such as the capacity for anaerobic growth. Page 517

51 Duplication of the S. cerevisiae genome What is the fate of duplicated genes? A duplicated gene (overall in eukaryotes) has a half life of just several million years (Lynch and Conery, 2000). 50% to 92% of duplicated genes are lost (Wagner, 2001) Consider four possible fates of a duplicated gene: Page 517

52 Duplication of the S. cerevisiae genome What is the fate of duplicated genes? A duplicated gene (overall in eukaryotes) has a half life of just several million years (Lynch and Conery, 2000). 50% to 92% of duplicated genes are lost (Wagner, 2001) Consider four possible fates of a duplicated gene: [1] Both copies persist (gene dosage effect) Page 517

53 Duplication of the S. cerevisiae genome What is the fate of duplicated genes? A duplicated gene (overall in eukaryotes) has a half life of just several million years (Lynch and Conery, 2000). 50% to 92% of duplicated genes are lost (Wagner, 2001) Consider four possible fates of a duplicated gene: [1] Both copies persist (gene dosage effect) [2] One copy is deleted (a common fate) Page 517

54 Duplication of the S. cerevisiae genome What is the fate of duplicated genes? A duplicated gene (overall in eukaryotes) has a half life of just several million years (Lynch and Conery, 2000). 50% to 92% of duplicated genes are lost (Wagner, 2001) Consider four possible fates of a duplicated gene: [1] Both copies persist (gene dosage effect) [2] One copy is deleted (a common fate) [3] One copy accumulates mutations and becomes a pseudogene (no functional protein product) Page 517

55 Duplication of the S. cerevisiae genome What is the fate of duplicated genes? A duplicated gene (overall in eukaryotes) has a half life of just several million years (Lynch and Conery, 2000). 50% to 92% of duplicated genes are lost (Wagner, 2001) Consider four possible fates of a duplicated gene: [1] Both copies persist (gene dosage effect) [2] One copy is deleted (a common fate) [3] One copy accumulates mutations and becomes a pseudogene (no functional protein product) [4] One copy (or both) diverges functionally. The organism can perform a novel function. Page 517

56 Duplication of the S. cerevisiae genome Why are duplicated genes commonly lost? It might seem highly advantageous to have a second copy of gene, thus permitting functional divergence. Ohno suggested two reasons: [1] After duplication, a deleterious mutation in one of the two genes might now persist. Without duplication, the individual would have been selected against by such a mutation. [2] The presence of a new paralogous sequence could lead to unequal crossing over of homologous chromosomes during meiosis. Page 518

57 Duplication of the S. cerevisiae genome To consider the fate of duplicated genes, consider the example of genes involved in vesicle transport. Vesicles carry cargo from one destination to another. Proteins on vesicles (e.g. vesicle-associated membrane protein, VAMP; Snc1p in yeast) bind to proteins on target membranes (e.g. syntaxin in mammalian and other eukaryotic systems, or Sso1p in yeast). In S. cerevisiae, genome duplication appears to be responsible for the presence of two syntaxins (SSO1 and SSO2) and two VAMPs (SNC1 and SNC2). Page 518

58 Duplication of the S. cerevisiae genome Sso1pSso2p Snc1pSnc2p Fig. 15.11 Page 518

59 Search for information on SSO1 (or any yeast gene) at the SGD website

60 Fig. 15.12 Page 519 The SGD record for SSO1 provides information on function

61 Duplication of the S. cerevisiae genome The SGD website reveals that the SSO1 gene is nonessential (i.e. the null mutant is viable), but the double knockout of SSO1 and SSO1 is lethal. Thus, these paralogs may offer functional redundancy to the organism. Also, these proteins could participate in distinct (but complementary) intracellular trafficking steps. Page 519

62 Duplication of the S. cerevisiae genome Andreas Wagner (2000) considered two ways an organism can compensate for mutations: via genes with overlapping functions (e.g. paralogs), or via genes with unrelated functions that participate in regulatory networks. He reported that overall, gene duplications did not provide robustness. Instead, interactions among unrelated genes provide robustness against mutations. Page 519

63 Functional genomics in yeast Functional genomics refers to the assignment of function to genes based on genome-wide screens and analyses. Next week, Jef Boeke will describe functional genomics (Monday). Joel Bader will describe proteomics in yeast (Wednesday). Page 520

64 Fig. 15.13 Page 520 We can consider functional genomics in yeast in terms of high throughput approaches at the levels of genes, transcripts, and proteins

65 Functional genomics in yeast (next week) Protein level Two-hybrid screens Affinity purification and mass spectrometry Pathways RNA level Microarrays SAGE transposon tagging Gene level Genetic footprinting Transposon insertion: random mutagenesis Gene deletion: targeted deletion of all ORFs!!!

66 Today’s final topic: comparative analysis of fungal genomes The fungi offer unprecedented opportunities for comparative genomic analyses -- relatively small genome sizes -- they are eukaryotes -- they exhibit significant differences in biology -- opportunities to apply functional genomics approaches in a comprehensive, genome-wide manner Page 528

67 Fungal and metazoan phylogeny Baldauf et al., 2000 Page 528

68 A variety of fungal genome sequencing projects sizechromosomes Aspergillus fumigatus30 Mb8 Aspergillus nigrans29 Mb8 Apergillus parasiticus Candida albicans16 Mb8 Cryptococcus neoformans 21 Mb Fusarium sporotrichiodes Magnaporthe grisea40 Mb7 Neurospora crassa43 Mb7 Phanerochaete chrysoporium 30 Mb10 Saccharomyces cerevisiae 13 Mb16 Schizosaccharomyces pombe 14 Mb3 Ustilago maydis20 Mb

69 An atypical fungus: Encephalitozoon cuniculi Microsporidia are single-celled eukaryotes that lack mitochondria and peroxisomes. Consistent with their roles as parasites, the E. cuniculi genome is severely reduced in size (2000 proteins, only 2.9 Mb). They were thought to represent deep-branching protozoans, but recent phylogenetic studies place them as an outgroup to fungi. Page 529

70 Fig. 15.22 Page 529 Encephalitozoon cuniculi as a fungal outgroup

71 Orange bread mold: Neurospora crassa Beadle and Tatum chose N. crassa as a model organism to study gene-protein relationships. The genome sequence was reported: 39 Mb, 7 chromosomes, 10,082 ORFs (Galagan et al., 2003). N. crassa has only 10% repetitive DNA, and incredibly, only 8 pairs of duplicated genes that encode proteins >100 amino acids. This is because Neurospora uses “repeat-induced point mutation” (RIP), a mechanism by which the genome is scanned for duplicated (repeated) sequences. This appears to serve as a genomic defense system, inactivating potentially harmful transposons. Page 530

72 Schizosaccharomyces pombe The S. pombe genome is 13.8 Mb and encodes ~4900 predicted proteins. Some bacterial genomes encode more proteins (e.g. Mesorhizobium loti with 6752, and Streptomyces coelicolor with 7825 genes). ChromosomegenesCoding 15.6 Mb2,25559% 24.4 Mb1,79058% 32.5 Mb88455% Total12.5 Mb4,92958% See: TIGR www.tigr.org EBI www.sanger.ac.uk/Projects/S_pombe Page 530

73 Schizosaccharomyces pombe ChromosomegenesCoding 15.6 Mb2,25559% 24.4 Mb1,79058% 32.5 Mb88455% Total12.5 Mb4,92958% See: TIGR www.tigr.org EBI www.sanger.ac.uk/Projects/S_pombe

74 Schizosaccharomyces pombe S. pombe diverged from S. cerevisiae about 330 to 420 million years ago. Many genes are as divergent between these two fungi as they are diverged from humans. To see this, try TaxPlot at NCBI. Page 530

75

76 Perspective and pitfalls The budding yeast S. cerevisiae is one of the most significant organisms in biology: Its genome is the first of a eukaryote to be sequenced Its biology is simple relative to metazoans Through yeast genetics, powerful functional genomics approaches have been applied to study all yeast genes It is important to note that even for yeast, our knowledge of basic biological questions is highly incomplete. We still understand little about how the genotype of an organism leads to its characteristic phenotype. Page 531


Download ppt "Eukaryotic Genomes: Fungi Wednesday, October 22, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner"

Similar presentations


Ads by Google