Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.

Slides:



Advertisements
Similar presentations
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Advertisements

EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Transcriptome Sequencing with Reference
Genome analysis and annotation. Genome Annotation Which sequences code for proteins and structural RNAs ? What is the function of the predicted gene products.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
MCSG Site Visit, Argonne, January 30, 2003 Genome Analysis to Select Targets which Probe Fold and Function Space  How many protein superfamilies and families.
Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,
Genome Annotation and the landscape of the Human Genome Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Characterizing Alternative Splicing With Respect To Protein Domains BME 220 Project Charlie Vaske.
Eukaryotic Gene Finding
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Eukaryotic Gene Finding
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Genome Annotation BCB 660 October 20, From Carson Holt.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Sequence Analysis with Artemis & Artemis Comparison Tool (ACT) South East Asian Training Course on Bioinformatics Applied to Tropical Diseases (Sponsored.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
MAKER Annotation Process Example of Glossina VectorBase Karyn Mégy Dan Hughes.
Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015.
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
BIOINFORMATIK I UEBUNG 2 mRNA processing.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
VectorBase BRC The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,
Genomics of Microbial Eukaryotes Igor Grigoriev Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Advancing Science with DNA Sequence Finding the genes in microbial genomes Natalia Ivanova MGM Workshop May 15, 2012.
Genome Annotation Rosana O. Babu.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Protein and RNA Families
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
From Genomes to Genes Rui Alves.
Curation Tools Gary Williams Sanger Institute. SAB 2008 Gene curation – prediction software Gene prediction software is good, but not perfect. Out of.
Sequencing the World of Possibilities for Energy & Environment MGM workshop. 19 Oct 2010 Information Sources for Genomics Konstantinos Mavrommatis Genome.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Fgenes++ pipelines for automatic annotation of eukaryotic genomes Victor Solovyev, Peter Kosarev, Royal Holloway College, University of London Softberry.
August 2008Bioinformatics tools for Comparative Genomics of Vectors1 Genome Annotation Daniel Lawson EBI.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Gene Finding in Chimpanzee Evidence based improvement of ab initio gene predictions Chris Shaffer06/2009.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Using DNA Subway in the Classroom Genome Annotation: Red Line.
Daphnia Genome Annotation & Analysis Notes July 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University
bacteria and eukaryotes
Annotating The data.
Introduction to Genes and Genomes with Ensembl
The Integrated Microbial Genome (IMG) systems
The Transcriptional Landscape of the Mammalian Genome
VectorBase genome annotation
Sequence based searches:
Genome Annotation Continued
Genome Annotation w/ MAKER
Introduction to Bioinformatics II
Strategies for annotation of a genome
Ensembl Genome Repository.
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Presentation transcript:

Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

2 Large and Complex Eukaryotes

3 Outline  Eukaryotic Genome Annotation  Fungal Genomics Program  MycoCosm

4 Started with Human Genome Project

5 genome.jgi.doe.gov IMG MycoCosm 150+ annotated eukaryotic genomes

6 Genomic assembly and ESTs Annotation Pipeline Gene predictions Protein annotations Reference data mapping Repeat masking Manual curation (optional) Annotation Pipeline Analysis Gene families Gene expression Phylogenomics Proteomics Protein targeting etc Annotation Validations

7 Protein-based methods build CDS exons around known protein alignments. (Fgenesh, GeneWise) GenBank protein Transcript-based methods map or assemble transcripts on the genome, including UTRs (EST_map, Combest) EST contig Predict model Ab initio methods use knowledge of known genes’ structures to predict start, stop, and splice sites in CDS only. (Fgenesh+, GeneMark) Train on known genes ATG TGA GT AG exonsintrons 5’UTR 3’UTR Promoter PolyA Gene model Eukaryotic Gene Prediction

8 More Gene Prediction Use ESTs/cDNAs to extend, correct or predict gene models ESTEXT Predicted model ESTs Extended model 5’UTR3’UTR ATG TGA ATGTGA Detect orthologs with poor alignments and refine with synteny based methods FGENESH2 Genome A Genome B FGENESH Representative set GENEWISE EXTERNAL MODELS Non-redundant gene set is built from “the best” models from each locus according to homology and ESTs, followed by manual curation

9 Combine Gene Predictors for Better Quality EugeneGenemarkFgeneshJGI Pipe Number of gene models11,5479,6098,40912,270 Models with partial EST support with full length EST support EST coverage per gene77.7%68.2%80.8%79.1% supported splice sites41,58140,80845,49847,671 Models with homology support with strong homology support (80+%ide, 80+%cov.) model coverage64%60%68%69% Models with homology and EST support Heterobasidion annosum v1.0

10 Re-annotation Using Comparative Genomics MAKERJGI pipelineRe-annot # of predicted gene models 9,94012,29012,802 with Swissprot hits6,5217,3567,900 With non-repeat PFAM domains 5,3656,0106,353 with EST support9,25210,79611,105 with >90% EST support 7,7299,1789,444 # of unique PFAM domains 2,2072,2452,322 EST coverage per gene 93.0%93.3% # EST-supported splice sites 99,627102,200104,246 Asaf Salamov

11 Predicted protein Protein Annotation Higher order assignments: Gene Ontology terms EC numbers --> KEGG pathways Gene families, with and without other species Possible orthologs (in nr, SwissProt, KEGG, KOG) Possible paralog (Blastp+MCL) Domain (InterPro, tmhmm) Signal peptide (signalP)

12 Validation with Transcriptomics Sanger454Illumina EST profile Processing RNA-Seq with CombEST models ESTs Old Sanger Days Transformation of EST sequencing

13 Validation with Proteomics Wright et al, BMC Genomics (2009)

14 Gene Cluster Analysis Comparative analysis

15 Genome Portal Framework

16 Many Genes of Eco-responsive Daphnia pulex First crustacean, aquatic animal sequenced, new model organism 30,940 predicted D.pulex genes in ~200Mb genome 85% supported by 1+ lines of evidence Colbourne et al, Science, 2011

17 Half of Daphnia Genes: no Homologs, Experessed Under Environmental Stress With Evgeny Zdobnov’s group (Univ. Genève) * Of 716 highly conserved single copy orthologs, Daphnia is missing only two Colbourne et al, 2011

18 Outline Eukaryotic Genome Annotation  Fungal Genomics Program  MycoCosm

19 Fungal Genomics for Energy & Environment Grow Grow Degrade Degrade Lignocellulose degradation Plant symbionts and pathogens Sugar Fermentation Ferment Ferment Bio-refinery GOAL: Scale up sequencing and analysis of fungal diversity for DOE science and applications

20 GOLD (October 2011) 758 fungal projects

21 Chapter 1: Plant health Symbiosis Plant Pathogenicity Biocontrol Chapter 2: Biorefinery Lignocellulose degradation Sugar fermentation Industrial organisms Chapter 3: Diversity Phylogentics Ecology Genomic Encyclopedia of Fungi

22 Genome-Centric View Comparative View fungal genomes visitors/month

23 Comparative Genome Analysis

24 Strategy: 1000 Fungal Genomes Goal: Sequencing 1000 fungal genomes from across the Fungal Tree of Life will provide references for research on plant-microbe interactions and environmental metagenomics.

25 Strategy: Fungal Systems Lichen: alga+ fungus ECM: plant+ fungus T.terrestris Forest soil metagenomes S.commune Model fungi Simple systems Complex environments

26 Model Mushroom Development Ohm et al, 2010 SEQUENCE FUNCTIONMODEL WT  S.commune Gene knock-outs Modeling regulatory cascades

27 Summary Eukaryotic Annotation Recipe: Combine gene predictors, experimental data, and community expertise Fungal Genomics: we aim to scale-up sequencing & comparative analysis of fungi relevant for energy & environment (jgi.doe.gov/fungi)

28 Enjoy Algae as well!

29 Acknowledgements JGI Staff Our Users

30 Outline Eukaryotic Genome Annotation Fungal Genomics Program  MycoCosm