Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.

Similar presentations


Presentation on theme: "Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA."— Presentation transcript:

1 Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA

2 2 Large and Complex Eukaryotes

3 3 Outline  Eukaryotic Genome Annotation  Fungal Genomics Program  MycoCosm

4 4 Started with Human Genome Project

5 5 genome.jgi.doe.gov IMG MycoCosm 150+ annotated eukaryotic genomes

6 6 Genomic assembly and ESTs Annotation Pipeline Gene predictions Protein annotations Reference data mapping Repeat masking Manual curation (optional) Annotation Pipeline Analysis Gene families Gene expression Phylogenomics Proteomics Protein targeting etc Annotation Validations

7 7 Protein-based methods build CDS exons around known protein alignments. (Fgenesh, GeneWise) GenBank protein Transcript-based methods map or assemble transcripts on the genome, including UTRs (EST_map, Combest) EST contig Predict model Ab initio methods use knowledge of known genes’ structures to predict start, stop, and splice sites in CDS only. (Fgenesh+, GeneMark) Train on known genes ATG TGA GT AG exonsintrons 5’UTR 3’UTR Promoter PolyA Gene model Eukaryotic Gene Prediction

8 8 More Gene Prediction Use ESTs/cDNAs to extend, correct or predict gene models ESTEXT Predicted model ESTs Extended model 5’UTR3’UTR ATG TGA ATGTGA Detect orthologs with poor alignments and refine with synteny based methods FGENESH2 Genome A Genome B FGENESH Representative set GENEWISE EXTERNAL MODELS Non-redundant gene set is built from “the best” models from each locus according to homology and ESTs, followed by manual curation

9 9 Combine Gene Predictors for Better Quality EugeneGenemarkFgeneshJGI Pipe Number of gene models11,5479,6098,40912,270 Models with partial EST support5544382945675248 with full length EST support2538118228963073 EST coverage per gene77.7%68.2%80.8%79.1% supported splice sites41,58140,80845,49847,671 Models with homology support6758604357507214 with strong homology support (80+%ide, 80+%cov.) 112109174187 model coverage64%60%68%69% Models with homology and EST support 2894217227202953 Heterobasidion annosum v1.0

10 10 Re-annotation Using Comparative Genomics MAKERJGI pipelineRe-annot # of predicted gene models 9,94012,29012,802 with Swissprot hits6,5217,3567,900 With non-repeat PFAM domains 5,3656,0106,353 with EST support9,25210,79611,105 with >90% EST support 7,7299,1789,444 # of unique PFAM domains 2,2072,2452,322 EST coverage per gene 93.0%93.3% # EST-supported splice sites 99,627102,200104,246 Asaf Salamov

11 11 Predicted protein Protein Annotation Higher order assignments: Gene Ontology terms EC numbers --> KEGG pathways Gene families, with and without other species Possible orthologs (in nr, SwissProt, KEGG, KOG) Possible paralog (Blastp+MCL) Domain (InterPro, tmhmm) Signal peptide (signalP)

12 12 Validation with Transcriptomics Sanger454Illumina 5531 34 EST profile Processing RNA-Seq with CombEST models ESTs Old Sanger Days Transformation of EST sequencing

13 13 Validation with Proteomics Wright et al, BMC Genomics (2009)

14 14 Gene Cluster Analysis Comparative analysis

15 15 Genome Portal Framework

16 16 Many Genes of Eco-responsive Daphnia pulex First crustacean, aquatic animal sequenced, new model organism 30,940 predicted D.pulex genes in ~200Mb genome 85% supported by 1+ lines of evidence Colbourne et al, Science, 2011

17 17 Half of Daphnia Genes: no Homologs, Experessed Under Environmental Stress With Evgeny Zdobnov’s group (Univ. Genève) * Of 716 highly conserved single copy orthologs, Daphnia is missing only two Colbourne et al, 2011

18 18 Outline Eukaryotic Genome Annotation  Fungal Genomics Program  MycoCosm

19 19 Fungal Genomics for Energy & Environment Grow Grow Degrade Degrade Lignocellulose degradation Plant symbionts and pathogens Sugar Fermentation Ferment Ferment Bio-refinery GOAL: Scale up sequencing and analysis of fungal diversity for DOE science and applications

20 20 GOLD (October 2011) 758 fungal projects

21 21 Chapter 1: Plant health Symbiosis Plant Pathogenicity Biocontrol Chapter 2: Biorefinery Lignocellulose degradation Sugar fermentation Industrial organisms Chapter 3: Diversity Phylogentics Ecology Genomic Encyclopedia of Fungi

22 22 Genome-Centric View Comparative View http://jgi.doe.gov/fungi 100+ fungal genomes 5000+ visitors/month

23 23 Comparative Genome Analysis

24 24 Strategy: 1000 Fungal Genomes Goal: Sequencing 1000 fungal genomes from across the Fungal Tree of Life will provide references for research on plant-microbe interactions and environmental metagenomics.

25 25 Strategy: Fungal Systems Lichen: alga+ fungus ECM: plant+ fungus T.terrestris Forest soil metagenomes S.commune Model fungi Simple systems Complex environments

26 26 Model Mushroom Development Ohm et al, 2010 SEQUENCE FUNCTIONMODEL WT  S.commune Gene knock-outs Modeling regulatory cascades

27 27 Summary Eukaryotic Annotation Recipe: Combine gene predictors, experimental data, and community expertise Fungal Genomics: we aim to scale-up sequencing & comparative analysis of fungi relevant for energy & environment (jgi.doe.gov/fungi)

28 28 Enjoy Algae as well! http://genome.jgi.doe.gov/Algae

29 29 Acknowledgements JGI Staff Our Users

30 30 Outline Eukaryotic Genome Annotation Fungal Genomics Program  MycoCosm


Download ppt "Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA."

Similar presentations


Ads by Google