Eukaryotic Genomes: Fungi Monday, November 24, 2011 Genomics 260.605.01 J. Pevsner

Slides:



Advertisements
Similar presentations
Evolution of genomes.
Advertisements

Whole Genome Duplications (Polyploidy) Made famous by S. Ohno, who suggested WGD can be a route to evolutionary innovation (focusing on neofunctionalization)
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Basics of Comparative Genomics Dr G. P. S. Raghava.
History, protohistory and prehistory of the Arabidopsis thaliana chromosome complement Henry Yves et al 2006, in press.
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Introduction to Genetics
Bioinformatics Genome anatomy Comparisons of some eukaryotic genomes Allignment of long genomic sequences Comparative genomics Oxford Grid Reconstruction.
The mating type locus Chr. III. The MAT locus information The MAT locus can encode three regulatory peptides: - a1 is encoded by the MATa allele -
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Eukaryotic Genomes: Fungi Wednesday, October 22, 2003 Introduction to Bioinformatics ME: J. Pevsner
1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology.
EXPLORING DEAD GENES Adrienne Manuel I400. What are they? Dead Genes are also called Pseudogenes Pseudogenes are non functioning copies of genes in DNA.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Genomics Lecture 8 By Ms. Shumaila Azam. 2 Genome Evolution “Genomes are more than instruction books for building and maintaining an organism; they also.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
Molecular Biology Fourth Edition
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 21 Eukaryotic Genome Sequences
whole-genome duplications and large segmental duplications… …seem to be a common feature in eukaryotic genome evolution …play a crucial role in the evolution.
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
The Pathways over Time Project A one-semester research project in comparative functional genomics Cysteine and methionine are superimposed over a portion.
Chap. 5 Problem 1 Recessive mutations must be present in two copies (homozygous) in diploid organisms to show a phenotype (Fig. 5.2). These mutations show.
Comparative genomics Haixu Tang School of Informatics.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Copyright © 2008 Pearson Education, Inc., publishing as Pearson Benjamin Cummings PowerPoint ® Lecture Presentations for Biology Eighth Edition Neil Campbell.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Opener Chapter 24 – Genome Evolution. Comparative Genomes Powerful tool for exploring evolutionary divergence among organisms Footprints on the evolutionary.
How many genes are there?
IB Saccharomyces cerevisiae - Jan Major model system for molecular genetics. For example, one can clone the gene encoding a protein if you.
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used for.
LECTURE PRESENTATIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry, Michael L. Cain, Steven A. Wasserman, Peter V. Minorsky, Robert.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
Eukaryotic genomes: fungi
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used.
Genomes and their evolution
Evolution of eukaryotic genomes
Evolution of gene function
Basics of Comparative Genomics
Genomes and their evolution
Pipelines for Computational Analysis (Bioinformatics)
Very important to know the difference between the trees!
Genomes and Their Evolution
Genomes and Their Evolution
SGN23 The Organization of the Human Genome
Genomes and Their Evolution
Today… Review a few items from last class
Genomes and Their Evolution
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
Evolution of eukaryote genomes
How to Use This Presentation
Gene Density and Noncoding DNA
Mating in yeast Stressed diploid yeast undergoes meiosis
Eukaryotic Genomes: Fungi
Extra chromosomal Agents Transposable elements
Chapter 6 Clusters and Repeats.
From Mendel to Genomics
Sex Chromosome Specialization and Degeneration in Mammals
Basics of Comparative Genomics
Phylogeny and the Tree of Life
Presentation transcript:

Eukaryotic Genomes: Fungi Monday, November 24, 2011 Genomics J. Pevsner

Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by J Pevsner, copyright © 2009 by Wiley-Blackwell. These images and materials may not be used without permission. Visit Copyright notice

Monday (today): Fungi (Chapter 17) Wednesday 11/16: Next-generation sequencing (Sarah Wheelan) Friday 11/18: Protozoans (David Sullivan) Monday 11/21: Eukaryotic genomes (Chapter 18) Wednesday 11/23: no class Friday 11/25: Thanksgiving Schedule

Outline of today’s lecture Description and classification of fungi The Saccharomyces cerevisiae genome Sequencing Features of the genome Yeast chromosomes Duplication of the yeast genome Functional genomics in yeast Comparative genomics of fungi

Summary of key points [1] S. cerevisiae has 16 chromosomes and ~6,000 genes [2] Its genome underwent a whole genome duplication followed by massive gene loss [3] Comparative genomics is a powerful approach --to identify genes --to identify regulatory regions --to infer evolutionary history of genome duplications and gene gain or loss [4] SGD (Saccharomyces Genome Database) is the major web resource for yeast

Introduction to fungi: phylogeny Fungi are eukaryotic organisms that can be filamentous (e.g. molds) or unicellular (e.g. the yeast Saccharomyces cerevisiae). Most fungi are aerobic (but S. cerevisiae can grow anaerobically). Fungi have major roles in the ecosystem in degrading organic waste. They have important roles in fermentation, including the manufacture of steroids and penicillin. Several hundred fungal species are known to cause disease in humans. Page 698

Eukaryotes (Baldauf et al., 2000)

Fungi and metazoa are sister groups Fig Page 698 Baldauf et al., 2000

Classification of fungi About 70,000 fungal species have been described (as of 1995), but 1.5 million species may exist. Four phyla: Ascomycotayeasts, truffles, lichens Basidiomycotarusts, smuts, mushrooms ChytridiomycotaAllomyces Zygomycotafeed on decaying vegetation Box 17-1 Page 699

Classification of fungi About 70,000 fungal species have been described (as of 1995), but 1.5 million species may exist. Four phyla: Ascomycotayeasts, truffles, lichens HemiascomycetaeS. cerevisiae EuascomycetaeNeurospora Loculoascomycetae Laboulbeniomycetaeparasites of insects Basidiomycotarusts, smuts, mushrooms ChytridiomycotaAllomyces Zygomycotafeed on decaying vegetation Box 17-1 Page 699

Alternate classification of fungi

Outline of today’s lecture Description and classification of fungi The Saccharomyces cerevisiae genome Sequencing Features of the genome Yeast chromosomes Duplication of the yeast genome Functional genomics in yeast Comparative genomics of fungi

Introduction to Saccharomyces cerevisiae First species domesticated by humans Called baker’s yeast (or brewer’s yeast) Ferments glucose to ethanol and carbon dioxide Model organism for studies of biochemistry, genetics, molecular and cell biology …rapid growth rate …easy to modify genetically …features typical of eukaryotes …relatively simple (unicellular) …relatively small genome Page 700

Sequencing the S. cerevisiae genome The genome was sequenced by a highly cooperative consortium in the early 1990s, chromosome by chromosome (the whole genome shotgun approach was not used). This involved 600 researchers in > 100 laboratories. --Physical map created for all XVI chromosomes --Library of 10 kb inserts constructed in phage --The inserts were assembled into contigs The sequence released in 1996, and published in 1997 (Goffeau et al., 1996; Mewes et al., 1997) Page 701

Features of the S. cerevisiae genome Sequenced length:12,068 kb = 12,068,000 base pairs Length of repeats:1,321 kb Total length:13,389 kb (~ 13 Mb) Open reading frames (ORFs):6,275 (see updates below) Questionable ORFs (qORFs): 390 Hypothetical proteins:5,885 Introns in ORFs:220 Introns in UTRs:15 Intact Ty elements: 52 tRNA genes:275 snRNA genes:40 Page 702

Features of the S. cerevisiae genome A notable feature of the genome is its high gene density (about one gene every 2 kilobases). Most bacteria have about one gene per kb, but most eukaryotes have a much sparser gene density. Also, only 4% of S. cerevisiae genes are interrupted by introns. By contrast, 40% of genes from the fungus Schizosaccharomyces pombe have introns. What are the most common protein families and protein domains? You can see the answer at EBI’s website: Page 701

Page 703

Fig Page The EBI website offers a variety of proteome analysis tools, such as this summary of protein length distribution in S. cerevisiae.

ORFs in the S. cerevisiae genome How are ORFs defined? In the initial genome analysis, an ORF was defined as >100 codons (thus specifying a protein of ~11 kilodaltons). 390 ORFs were listed as “questionable”, because they were considered unlikely to be authentic genes. For example, they were short, or exhibited unlikely preferences for codon usage. How many ORFs are there in the yeast genome? There are 40,000 ORFs > 20 amino acids; how many of these are authentic? Page 703

ORFs in the S. cerevisiae genome Several criteria may be applied to decide if ORFs are authentic protein-coding genes: [1] evidence of conservation in other organisms [2] experimental evidence of gene expression (microarrays, SAGE, functional genomics) The groups of Elizabeth Winzeler and Michael Snyder each described hundreds of previously unannotated genes that are transcribed and translated. Page 704

ORFs in the S. cerevisiae genome The MIPS Comprehensive Yeast Genome Database lists criteria for assigning ORFs, based on FASTA search scores: Number of proteins Category Known protein Strong similarity to known protein Weak similarity to known protein Similarity to unknown protein No similarity Questionable ORF Total Page 704

Revising the S. cerevisiae gene count through comparative genomics By sequencing three additional yeast species (Saccharomyces paradoxus, S. bayanus, S. mikatae), Kellis et al. (Nature 423:241, 2003) showed that 503 genes should be deleted from the set of yeast genes (leaving 5,726 including 43 newly discovered genes).

Comparing the DNA sequences from several species makes it possible to find regulatory regions — short sequences that turn genes on and off — and eliminate spurious gene predictions. Red boxes highlight areas of sequence similarity between at least two species. Functional sequences — genes and regulatory elements — tend to be conserved across all species. The figure shows how one true regulatory element and one correctly identified gene might emerge from a comparison of four yeast species. Salzberg SL (2003) Nature 423:233

Kellis et al. (2003) Nature 423:241.

The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. (N50 contig length = 26,260 kb for human reference assembly) A scaffold is a portion of the genome sequence reconstructed from end-sequenced whole-genome shotgun clones. Scaffolds are composed of contigs and gaps.

Predicted ORFs are shown as arrows pointing in the direction of transcription. Orthologous ORFs are connected by dotted lines and are coloured by the type of correspondence: red for 1-to-1 matches, blue for 1-to-2 matches and white for unmatched ORFs. Sequence gaps are indicated by vertical lines at the ends of contigs, with the estimated size of each gap shown by the length of the hook. See Supplementary Information for 250 such figures tiling the complete S. cerevisiae genome. Kellis et al. (2003) Nature 423:241.

Exploring a typical S. cerevisiae chromosome We will next familiarize ourselves with the S. cerevisiae genome by exploring a typical chromosome, XII. Page 704

Exploring a typical S. cerevisiae chromosome We will next familiarize ourselves with the S. cerevisiae genome by exploring a typical chromosome, XII. This chromosome features 38% GC content very little repetitive DNA few introns six Ty elements (transposable elements) a high ORF density: 534 ORFs > 100aa, and 72% of the chromosome has protein-coding genes Page 704

Key S. cerevisiae databases Web resources include: NCBI (Entrez  Genome  Eukaryotic genome projects) EBI SGD: Saccharomyces Genome Database MIPS Comprehensive Yeast Genome Database (MIPS = Munich Information Center for Protein Sequences) Page 705 Most Important!

NCBI: Entrez genomes for yeast resources

Fig Page 704 updated 11/09

NCBI: Entrez genomes for yeast resources

Fig Page Saccharomyces Genome Database (SGD): primary web resource for yeast genomics

Vast set of resources

S. cerevisiae gene nomenclature YKL159c Y = yeast K = 11 th chromosome L = left (or right) arm (relative to centromere) 159 = 159 th ORF c = Crick (bottom) or w (Watson, top) strand Box 15-3 Page 707

S. cerevisiae gene nomenclature YKL159c Y = yeast K = 11 th chromosome L = left (or right) arm 159 = 159 th ORF c = Crick (bottom) or w (Watson, top) strand RCN1 = wildtype gene Rcn1p = protein rcn1 = mutant allele Box 15-3 Page 707

Outline of today’s lecture Description and classification of fungi The Saccharomyces cerevisiae genome Sequencing Features of the genome Yeast chromosomes Duplication of the yeast genome Functional genomics in yeast Comparative genomics of fungi

Duplication of the S. cerevisiae genome Analysis of the S. cerevisiae genome revealed that many regions are duplicated, both intrachromosomally and interchromosomally (within and between chromosomes). These duplicated regions include both genes and nongenic regions. Such duplications reflect a fundamental aspect of genome evolution. What are the mechanisms by which regions of the genome duplicate? Page 708

Duplication of the S. cerevisiae genome Mechanisms of gene duplication tandem repeat slippage during recombination Gene conversion Lateral gene transfer Segmental duplication polyploidy e.g. genome tetraploidy Fig Page 708

Duplication of the S. cerevisiae genome Fate of duplicated genes Both copies persist One copy is deleted One copy becomes a pseudogene One copy functionally diverges Fig Page 708

Duplication of the S. cerevisiae genome What is the fate of duplicated genes? (see YGOB, below) A duplicated gene (overall in eukaryotes) has a half life of just several million years (Lynch and Conery, 2000). 50% to 92% of duplicated genes are lost (Wagner, 2001) Consider four possible fates of a duplicated gene: Page 711

Duplication of the S. cerevisiae genome What is the fate of duplicated genes? A duplicated gene (overall in eukaryotes) has a half life of just several million years (Lynch and Conery, 2000). 50% to 92% of duplicated genes are lost (Wagner, 2001) Consider four possible fates of a duplicated gene: [1] Both copies persist (gene dosage effect) Page 711

Duplication of the S. cerevisiae genome What is the fate of duplicated genes? A duplicated gene (overall in eukaryotes) has a half life of just several million years (Lynch and Conery, 2000). 50% to 92% of duplicated genes are lost (Wagner, 2001) Consider four possible fates of a duplicated gene: [1] Both copies persist (gene dosage effect) [2] One copy is deleted (a common fate) Page 711

Duplication of the S. cerevisiae genome What is the fate of duplicated genes? A duplicated gene (overall in eukaryotes) has a half life of just several million years (Lynch and Conery, 2000). 50% to 92% of duplicated genes are lost (Wagner, 2001) Consider four possible fates of a duplicated gene: [1] Both copies persist (gene dosage effect) [2] One copy is deleted (a common fate) [3] One copy accumulates mutations and becomes a pseudogene (no functional protein product) Page 711

Duplication of the S. cerevisiae genome What is the fate of duplicated genes? A duplicated gene (overall in eukaryotes) has a half life of just several million years (Lynch and Conery, 2000). 50% to 92% of duplicated genes are lost (Wagner, 2001) Consider four possible fates of a duplicated gene: [1] Both copies persist (gene dosage effect) [2] One copy is deleted (a common fate) [3] One copy accumulates mutations and becomes a pseudogene (no functional protein product) [4] One copy (or both) diverges functionally. The organism can perform a novel function. Page 711

Duplication of the S. cerevisiae genome In 1970, Susumu Ohno published the book Evolution by Gene Duplication. He hypothesized that vertebrate genomes evolved by two rounds of whole genome duplication. This provided genomes with the “raw materials” (new genes) with which to introduce various innovations. Page 709

Duplication of the S. cerevisiae genome Ohno (1970): “Had evolution been entirely dependent upon natural selection, from a bacterium only numerous forms of bacteria would have emerged. The creation of metazoans, vertebrates, and finally mammals from unicellular organisms would have been quite impossible, for such big leaps in evolution required the creation of new gene loci with previously nonexistent function. Only the cistron that became redundant was able to escape from the relentless pressure of natural selection. By escaping, it accumulated formerly forbidden mutations to emerge as a new gene locus.” Page 709

Duplication of the S. cerevisiae genome Wolfe and Shields (1997, Nature) provided support for Ohno’s paradigm. They hypothesized that the yeast genome duplicated about 100 million years ago. Originally there was a diploid yeast genome with about 5,000 genes (on 8 chromosomes). It doubled to a tetraploid number of 10,000 genes (on 16 chromosomes). Then there was massive gene loss and chromosomal rearrangement to yield the present day 6,000 genes. Page 709

Fig Page 710 Distance along chromosome X (kb) Distance along chromosome XI (kb) Wolfe and Shields (1997) performed blastp and found 55 blocks of duplicated regions. They proposed that the entire S. cerevisiae genome underwent a duplication. Matches with scores >200 are shown. These are arranged in blocks of genes.

Duplication of the S. cerevisiae genome Evidence of genome duplication in yeast -- Systematic BLAST searches show 55 blocks of duplicated sequences. -- There are 376 pairs of homologous genes. You can see the results of chromosomal comparisons on Ken Wolfe’s web site and at the SGD web site. Page 710

Duplication of the S. cerevisiae genome Two models for the presence of duplication blocks [1] Whole genome duplication (tetraploidy) followed by gene loss and rearrangements [2] Successive, independent duplication events Page 711

Duplication of the S. cerevisiae genome Model [1] is favored for several reasons: -- For 50 of 55 duplicated regions, the orientation of the entire block is preserved with respect to the centromere. The orientation is not random. -- For model [2] we would expect 7 triplicated regions. We observe only 0 or Gene order is maintained in 14 hemiascomycetes (the Génolevures project) Page 711

Duplication of the S. cerevisiae genome Why are duplicated genes commonly lost? It might seem highly advantageous to have a second copy of gene, thus permitting functional divergence. Ohno suggested two reasons: [1] After duplication, a deleterious mutation in one of the two genes might now persist. Without duplication, the individual would have been selected against by such a mutation. [2] The presence of a new paralogous sequence could lead to unequal crossing over of homologous chromosomes during meiosis. Page 711

Duplication of the S. cerevisiae genome To consider the fate of duplicated genes, consider the example of genes involved in vesicle transport. Vesicles carry cargo from one destination to another. Proteins on vesicles (e.g. vesicle-associated membrane protein, VAMP; Snc1p in yeast) bind to proteins on target membranes (e.g. syntaxin in mammalian and other eukaryotic systems, or Sso1p in yeast). In S. cerevisiae, genome duplication appears to be responsible for the presence of two syntaxins (SSO1 and SSO2) and two VAMPs (SNC1 and SNC2). Page 711

Duplication of the S. cerevisiae genome Sso1pSso2p Snc1pSnc2p Fig Page 469

Search for information on SSO1 (or any yeast gene) at the SGD website

The SGD record for SSO1 provides information on function

Duplication of the S. cerevisiae genome The SGD website reveals that the SSO1 gene is nonessential (i.e. the null mutant is viable), but the double knockout of SSO1 and SSO1 is lethal. Thus, these paralogs may offer functional redundancy to the organism. Also, these proteins could participate in distinct (but complementary) intracellular trafficking steps. Page 711

Comparative analyses of hemiascomycetes: Whole genome duplication You can explore duplicated genome regions using the Yeast Gene Order Browser (YGOB) at:

Kenneth Wolfe offers a website that permits analysis of yeast duplications:

Fig Page 713 Yeast Gene Order Browser

Fig Page 714 Yeast Gene Order Browser: patterns of gene loss after WGD

Duplication of the S. cerevisiae genome The Génolevures project: -- Sequencing of 13 hemiascomycetes -- Gene order can be compared in 14 fungi -- 70% of the S. cerevisiae genome maps to sister regions with only minimal overlap -- Proposal that the 16 centromeres form 8 pairs Page 712

Duplication of the S. cerevisiae genome The Génolevures project: -- Sequencing of 13 hemiascomycetes -- Gene order can be compared in 14 fungi -- 70% of the S. cerevisiae genome maps to sister regions with only minimal overlap -- Proposal that the 16 centromeres form 8 pairs Phylogenetic analyses place the divergence of S. cerevisiae and Kluyveromyces lactis prior to the whole genome duplication (~100 million years ago). Perhaps the genome duplication enabled S. cerevisiae to acquire new properties such as the capacity for anaerobic growth. Page 712

It had long been suspected that the genome of the yeast Saccharomyces cerevisiae arose through the duplication of the genome of an ancestral yeast. Three new papers confirm this suspicion. The confirmation involved sequencing the genomes of yeasts such as Kluyveromyces waltii, Ashbya gossypii, K. lactis and Candida glabrata (which served as reference species) and comparing their gene sequences, and their order and orientation, with S. cerevisiae. The genes in grey are in the same order and orientation in the reference species and S. cerevisiae. Some of these genes, those in yellow, have relatives that are found in two different chromosomal locations (copies 1 and 2) in the S. cerevisiae genome. Some, however, have relatives in the same order and orientation only on copy 1 (red), or only on copy 2 (blue). The findings indicate that genome duplication occurred in the lineage that led to S. cerevisiae; some genes were then maintained, while many others were lost or diversified. A. Goffeau (2004). Nature 430, 25-26

Detecting whole-genome duplications. André Goffeau (2004). Evolutionary genomics: Seeing double. Nature 430, 25-26

[See figure on next slide.] Model of WGD followed by massive gene loss predicts gene interleaving in sister regions. a, After divergence from K. waltii, the Saccharomyces lineage underwent a genome duplication event, creating two copies of every gene and chromosome. b, The vast majority of duplicated genes underwent mutation and gene loss. c, Sister segments retained different subsets of the original gene set, keeping two copies for only a small minority of duplicated genes, which were retained for functional purposes. d, Within S. cerevisiae, the only evidence comes from the conserved order of duplicated genes (numbered 3 and 13) across different chromosomal segments; the intervening genes are unrelated. e, Comparison with K. waltii reveals the duplicated nature of the S. cerevisiae genome, interleaving genes from sister segments on the basis of the ancestral gene order. M. Kellis, B.W. Birren and E.S. Lander (2004) Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428,

common ancestor Saccharomyces lineage Kluyveromyces lineage Kellis M et al. (2004) Nature 428,

Branches show number of substitutions per thousand amino acids. Evolutionary trees are rooted at the divergence of three species. a, Average protein divergence of all 457 gene pairs that arose by WGD. The faster-evolving paralogue is arbitrarily designated as copy 2 for each pair. Kellis M et al. (2004) Nature 428,

Branches show number of substitutions per thousand amino acids. Evolutionary trees are rooted at the divergence of three species. b, Example of a gene showing accelerated protein divergence (1 of 72 cases). Ancestral and derived gene function can be inferred by comparison to K. waltii. In this case, the origin- of-replication recognition complex protein Orc1 is inferred to be ancestral and the silencing protein Sir3 is inferred to be derived. Kellis M et al. (2004) Nature 428,

c, Example of duplicated gene pairs that have undergone recent gene conversion (1 of 60 cases). Comparison with S. bayanus shows that recent gene conversion events occurred both in S. cerevisiae and in S. bayanus lineages. Dotted lines connect orthologous genes in S. cerevisiae and in S. bayanus. Kellis M et al. (2004) Nature 428,

After Kellis M et al. (2004) Nature 428, d, Phylogeny and relative time of WGD. Estimated tree lengths are as reported. Fig Page 712

Comparative analyses of hemiascomycetes: Identification of functional elements Kellis et al. (2003) compared S. paradoxus, S. mikatae, and S. bayanus to S. cerevisiae (divergence dates: 5 to 20 MYA). There were clear orthologous matches, except at the telomeres. For the Gal4 transcription factor and other functional elements, comparative analyses have helped delineate regulatory regions. Page 714

Fig Page 715 Yeast Gal4 transcription factor binding site: note conserve regions between the genes GAL10, GAL1

Outline of today’s lecture Description and classification of fungi The Saccharomyces cerevisiae genome Sequencing Features of the genome Yeast chromosomes Duplication of the yeast genome Functional genomics in yeast Genetic footprinting Exogenous transposons Molecular barcodes Comparative genomics of fungi

Functional genomics in yeast Functional genomics refers to the assignment of function to genes based on genome-wide screens and analyses.

We can consider functional genomics in yeast in terms of high throughput approaches at the levels of genes, transcripts, and proteins

Functional genomics in yeast Protein level Two-hybrid screens Affinity purification and mass spectrometry Pathways RNA level Microarrays SAGE transposon tagging Gene level Genetic footprinting Transposon insertion: random mutagenesis Gene deletion: targeted deletion of all ORFs!!!

Outline of today’s lecture Description and classification of fungi The Saccharomyces cerevisiae genome Sequencing Features of the genome Yeast chromosomes Duplication of the yeast genome Functional genomics in yeast Comparative genomics of fungi

Today’s final topic: comparative analysis of fungal genomes The fungi offer unprecedented opportunities for comparative genomic analyses -- relatively small genome sizes -- they are eukaryotes -- they exhibit significant differences in biology -- opportunities to apply functional genomics approaches in a comprehensive, genome-wide manner Page 715

Fungal and metazoan phylogeny Baldauf et al., 2000 Page 698

Fungal genome projects There are >250 fungal genome projects (17 complete, 127 in assembly, 127 in progress). Most of these are Ascomycetes (193) 60 basidiomycetes 18 “other” including: Antonospora locustae 2.9 Mb Batrachochytrium dendrobatidis 20 Mb 20 chrom. Encephalitozoon cuniculi 2.5 Mb 11 chrom. Rhizopus oryzae 40 Mb updated Nov. 2010

Fungal genome projects Ascomycetes include: Ajellomyces capsulatus (four strains) Aspergillus (7 species including Aspergillus nidulans) Botryotinia fuckeliana (2 strains) Candida (3 strains of C. albicans; total of 6 species) Coccidioides immitis (15 Coccidioides genomes) Kluyveromyces lactis (4 Kluyveromyces genomes) Neurospora crassa Pichia (5 genomes) Saccharomyces (14 genomes including S. cerevisiae) Schizosaccharomyces pombe (3 schizosacch. genomes) Yarrowia lipolytica updated 11/09

Fungal genome projects Basidiomycetes; size in megabases (Mb), # chromosomes () Coprinopsis cinerea okayama 37.5 Mb (13) Cryptococcus neoformans 20 Mb (14) Cryptococcus neoformans 18 Mb (14) Cryptococcus neoformans var. grubii H99 20 Mb (14) Cryptococcus neoformans var. neoformans B-3501A Mb (14) Cryptococcus neoformans var. neoformans JEC (14) Lentinula edodes L-54 8 Mb Phakopsora meibomiae Phakopsora pachyrhizi 50 Mb Phanerochaete chrysosporium RP Mb (10) Ustilago maydis Mb (23) updated 11/09

Fungal Genomes Central at NCBI

Fungal pathogen: Aspergillus nidulans --Of 185 Aspergillus species, 20 are human pathogens --A. nidulans has a sexual life cycle (in contrast to A. fumigatus and A. oryzae [sake, miso, soy]). --A. nidulans has animal-like peroxisomal enzymes Page 715

Use TaxPlot to identify evolving Aspergillus proteins

Fungal pathogen: Candida albicans --Diploid sexually reproducing fungus --Causes opportunistic infections in humans --Genome: 14.8 Mb with 8 chromosome pairs. Seven of these are constant, and the 8 th varies from 3 to 4 Mb. --No known haploid state; the heterozygous diploid state was sequenced. --Over 7600 open reading frames --CUG is translated as serine (rather than leucine) Page 718

An atypical fungus: Encephalitozoon cuniculi Microsporidia are single-celled eukaryotes that lack mitochondria and peroxisomes. Consistent with their roles as parasites, the E. cuniculi genome is severely reduced in size (2000 proteins, only 2.9 Mb). They were thought to represent deep-branching protozoans, but recent phylogenetic studies place them as an outgroup to fungi. Page 719

Fig Page 720 Encephalitozoon cuniculi as a fungal outgroup

Orange bread mold: Neurospora crassa Beadle and Tatum chose N. crassa as a model organism to study gene-protein relationships. The genome sequence was reported: 39 Mb, 7 chromosomes, 10,082 ORFs (Galagan et al., 2003). N. crassa has only 10% repetitive DNA, and incredibly, only 8 pairs of duplicated genes that encode proteins >100 amino acids. This is because Neurospora uses “repeat-induced point mutation” (RIP), a mechanism by which the genome is scanned for duplicated (repeated) sequences. This appears to serve as a genomic defense system, inactivating potentially harmful transposons. Page 719

Schizosaccharomyces pombe The S. pombe genome is 13.8 Mb and encodes ~4900 predicted proteins. Some bacterial genomes encode more proteins (e.g. Mesorhizobium loti with 6752, and Streptomyces coelicolor with 7825 genes). ChromosomegenesCoding 15.6 Mb2,25559% 24.4 Mb1,79058% 32.5 Mb88455% Total12.5 Mb4,92958% See: EBI Page 721

Schizosaccharomyces pombe ChromosomegenesCoding 15.6 Mb2,25559% 24.4 Mb1,79058% 32.5 Mb88455% Total12.5 Mb4,92958% See: EBI

Schizosaccharomyces pombe S. pombe diverged from S. cerevisiae about 330 to 420 million years ago. Many genes are as divergent between these two fungi as they are diverged from humans. To see this, try TaxPlot at NCBI. Page 721

Perspective and pitfalls The budding yeast S. cerevisiae is one of the most significant organisms in biology: Its genome is the first of a eukaryote to be sequenced Its biology is simple relative to metazoans Through yeast genetics, powerful functional genomics approaches have been applied to study all yeast genes It is important to note that even for yeast, our knowledge of basic biological questions is highly incomplete. We still understand little about how the genotype of an organism leads to its characteristic phenotype. Page 721

Summary of key points [1] S. cerevisiae has 16 chromosomes and ~6,000 genes [2] Its genome underwent a whole genome duplication followed by massive gene loss [3] Comparative genomics is a powerful approach --to identify genes --to identify regulatory regions --to infer evolutionary history of genome duplications and gene gain or loss [4] SGD (Saccharomyces Genome Database) is the major web resource for yeast