Operated by Los Alamos National Security, LLC for NNSA Bioscience Discovering virulence genes present in novel strains and metagenomes Chris Stubben IC.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Applications of GO. Goals of Gene Ontology Project.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Gene Ontology John Pinney
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
CACAO Biocurator Training CACAO Fall CACAO Syllabus What is CACAO & why is it important? Training Examples.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Protein and Function Databases
BICH CACAO Biocurator Training Session #3.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
1 SRI International Bioinformatics Advanced PGDB Editing: Regulation GO Terms Ingrid M. Keseler Bioinformatics Research Group SRI International
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
Automatic methods for functional annotation of sequences Petri Törönen.
Metagenomic Analysis Using MEGAN4
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
BIOINFORMATIK I UEBUNG 2 mRNA processing.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Gene Product Annotation using the GO ml Harold J Drabkin Senior Scientific Curator The Jackson Laboratory.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Protein and RNA Families
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
© 2006 The MITRE Corporation. All rights reserved 1 Draft Ontologies n TVFac (Toxin and Virulence Factors) –Los Alamos National Laboratory (LANL) –Chris.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
Influenza Ontology Infectious Disease Ontology Workshop 2008 Burke Squires.
Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
Gene Annotation & Gene Ontology
Networks and Interactions
CACAO Training ASM-JGI 2012.
Annotating with GO: an overview
Introduction to the Gene Ontology
Virulence Ontology effort
Sequence based searches:
Department of Genetics • Stanford University School of Medicine
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
Genome Annotation Continued
Welcome to the Protein Database Tutorial
Insight into GO and GOA Angelica Tulipano , INFN Bari CNR
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Operated by Los Alamos National Security, LLC for NNSA Bioscience Discovering virulence genes present in novel strains and metagenomes Chris Stubben IC postdoc, B-7

Operated by Los Alamos National Security, LLC for NNSA Bioscience

Overview Review current functional classification systems Discuss Virulence Factor Ontology Identify virulence genes in novel strains and metagenomes Slide 3

Functional classification systems EC numbers for enyzmes (1956) Swiss-Prot keywords (1986) E. coli gene functions, M. Riley (1993) TIGR role categories (1995) Gene Ontology (1998) Slide 4 gen e function

What functions are related to virulence? Some systems have a few terms – Swiss-Prot keywords = virulence, toxin, antibiotic resistance – TIGR roles = pathogenesis, toxin production and resistance Gene Ontology (GO) also has pathogenesis, resistance to antibiotics, plus many more Slide 5 GO terms related to the enzymatic activity of toxins

Gene Ontology (GO) 25,688 terms in three structured controlled vocabularies (ontologies) – biological processes – 2186 cellular components – 8404 molecular functions Standard for eukaryotic gene annotation Increasingly used for prokaryotes – TIGR (2002) – Plant pathogens by PAMGO at VBI (2005) – Human pathogens at 8 BRCs (2006) Slide 6

Bioinformatics Resource Centers (BRC) NIAID funded, $100 million dollar effort to create eight bioinformatic centers for human pathogens Goal is to provide easy access to genomic data from multiple strains like eukaryotic model organism databases Slide 7 BRCs =?

Example: Toxin annotation in GO Slide 8 Step 1, Assign GO terms, maybe – activation of Rho GTPase activity – N-terminal peptidyl-glutamine deamination – actin cytoskeleton reorganization – stress fiber formation

Step 2, add references and evidence codes Slide 9 Virulence Protein Experimental Sequence similarity Genomic context Computational Function Knockout mutants (IMP) Overexpression phenotypes (IDA) Genetic interactions (IGI) Microarrays (IEP or RCA) BLAST alignments (ISA) Orthologous proteins (ISO) Hidden markov models of protein families or domains (ISM) Phlyogenetic profiles, conserved neighborhoods, gene fusion, shared regulatory sites, etc (IGC)

Example: Toxin searches in GO Slide 10 If a gene is annotated to ‘adenylate cyclase activity’, how do you know it’s a toxin? It may also annotated to “cell killing” or related term, but is that enough? However, an alternative is to define virulence factors and toxins (both outside the scope of GO) in a new ontology

Why we need a Virulence Factor ontology Lots of effort to characterize pathogenic processes and systems (eg, BRCs) Many different definitions of pathogen, virulence and virulence factors Not clear what terms in GO may be related to toxins and virulence (BRCs have already assigned 750,000 GO terms to 300,000 genes) Slide 11

Virulence Factor Ontology working group Goal is to combine existing toxin and virulence terms from various groups into a single ontology – TVFac and antibiotic resistance (AR) terms at LANL – Gemina virulence factors and AR terms at U. of Maryland – PAMGO terms in GO Participants – MITRE. Lynette Hirschmman, Marc Colosimo, and others – LANL. Chris Stubben, Murray Wolinsky and Jian Song – U of Maryland IGS. Lynn Schriml and Michelle Gwinn Slide 12

Virulence Factor Ontology (VFO) Three new ontologies, one very simple that points to additional terms in GO or to new ontologies Virulence factor (definition needed!) – toxin associated processes – antibiotic resistance – adhesion – entry into host – acquisition of nutrients from host – avoidance of host defenses – growth within host – modification of host morhphology – dissemination from host Slide 13 New simplified GO trees (slims)

Virulence genes in novel strains Emerging, engineered and novel strains will most likely be sequenced quickly using next generation sequencing technologies, and then compared to near neighbor strains using sequence similarity (BLAST) or models (HMMs like PFams, TIGRFams, FIGFams, EnteroFams, etc). Slide 14

Compare novel strains to what? Very few manual annotations available for prokaryotes, especially in public databases like NCBI and UniProt Slide 15 “Curated information from the literature serves as the gold-standard data set for comparative analyses” -Nature Sep10, 2008 Table 1. Percentage of genes in UniProt with functional assignments to Gene Ontology terms based on experimental evidence in the primary literature. Use BRCs!

BRC annotations Genomes annotations should have references and evidence codes signifying whether annotations were produced experimentally or computationally Slide % of Y.pestis CO92 with manual annotations

Y. pestis CO92 annotations at ERIC Slide 17 Table 1 and 2. Sequence features and coding sequence annotations for Y. pestis CO92 at ERIC

Yersinia antibiotic resistance genes Slide 18 Table 1 and 2. Antibiotic resistance genes found using Swiss-prot keyword search ‘antibiotic resistance’ in UniProt and using GO term search ‘response to antibiotic’ in ERIC. Only one gene in common!

Vibrio toxins in GO, UniProt, and NMPDR Slide 19

Virulence genes in metagenomes Recent comparison of virulence genes in chicken, cow, mouse and human gut metagenomes (metavirulomes) was based on SEED subsystem categories at NMPDR Slide 20 Another alternative is to use GO term mappings to protein family and domain databases like PFam

IMG/metagenomes from JGI Slide 21 Select metagenomes and save

Create abundance profiles Slide 22 Compare using Pfam, COG, or TIGRfam abundance profiles

Find virulence genes Slide 23 Use GO term mappings to PFAM database to find virulence genes

Need better mappings to virulence genes Current GO term mappings miss most virulence- associated genes. Slide 24 Table 1 and 2. PFAMs and TIGRfams overrepresented in air compared to soil