Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Slides:



Advertisements
Similar presentations
Model Organism Databases and Community Annotation
Advertisements

2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Modeling Functional Genomics Datasets CVM Lesson 3 13 June 2007Fiona McCarthy.
BRC6 28 th October 2008 Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community. Daniel Lawson, VectorBase.
VectorBase BRC VectorBase annotation metrics Daniel Lawson VectorBase-EBI, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
UniProt - The Universal Protein Resource
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Genome Annotation BCB 660 October 20, From Carson Holt.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Cis-Regulatory/ Text Mining Interface Discussion.
1 SRI International Bioinformatics Advanced PGDB Editing: Regulation GO Terms Ingrid M. Keseler Bioinformatics Research Group SRI International
1 of 34 Ensembl use of RNASeq Steve Searle. 2 of 34 Ways we use RNASeq data in Ensembl: Build complete gene set from scratch for individual or pooled.
EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
New data and tools at TAIR (The Arabidopsis Information Resource)
Annotation of Anopheline Genomes at VectorBase Dan Lawson, VectorBase & The Anopheles Genomes Cluster Consortium EMBL-EBI.
The new VectorBase: our improved resource for invertebrate vectors Scott Emrich On behalf of VectorBase “bigger, better, faster” Or “ "consolidate, improve.
Data Management David Nathan & Peter Austin & Robert Munro.
Managing Data Modeling GO Workshop 3-6 August 2010.

NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation.
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
MAKER Annotation Process Example of Glossina VectorBase Karyn Mégy Dan Hughes.
NCBI Vector-Parasite Genomic Related Databases Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 12, 2004
Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
EBI is an Outstation of the European Molecular Biology Laboratory. Bioinformatics Challenges in Data Handling and Presentation to the Bioinformaticists.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
VectorBase BRC The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Production Priorities. Genome protein sets User Support Production systems change Database changes On-the-fly species gene associations.
Genomics of Microbial Eukaryotes Igor Grigoriev Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Genome Annotation Rosana O. Babu.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Tsute (George) Chen Bioinformatics Core Department of Microbiology The Forsyth Institute March 24 th, 2015 HOMD A Tour to the Data and Tools.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Copyright OpenHelix. No use or reproduction without express written consent1.
August 2008Bioinformatics Tools for Comparative Genomics of Vectors1 Genomes Daniel Lawson EBI.
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
Overview and History of VectorBase Frank Collins March 31, 2015.
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
The Protein Identifier Cross-Reference (PICR) service.
August 2008Bioinformatics tools for Comparative Genomics of Vectors1 Genome Annotation Daniel Lawson EBI.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Annotating The data.
VectorBase genome annotation
Genome Sequence Annotation Server
Functional Annotation of the Horse Genome
Genome Annotation w/ MAKER
Yating Liu July 2018 G-OnRamp workshop
Follow-up from last night: XSEDE credits
Welcome - webinar instructions
Presentation transcript:

Importing Community annotations into VectorBase

Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry requirements, be scaleable and (relatively) simple to use

Genome annotation First-pass genome annotation is almost always based on “automatic” computational approaches ab initio Similarity based Transcript (ESTs, RNAseq) Protein (nr protein database)

Genome assembly Map Repeats Genefinding Protein-coding genes Map Transcripts Map Peptides nc-RNAs Functional annotation Submission to archival databases (Release) Genome annotation - building a pipeline

Current VectorBase annotation pipeline MAKER based automatic annotation includes SNAP training and ab initio RNAseq based transcript similarity prediction Taxonomically constrained peptide similarity prediction 2 rounds of prediction refinement & final round includes all peptide similarity Community annotation phase Capture gene structure changes Metadata associated with locus (symbol, description, citation) Submission to INSDC, propagation to UniProt Presentation through VectorBase Start 1.0 set (automatic) 1.1 set (published)

Processing submissions 4 phases Capture Moderation Storage Integration

Capture: Community annotation decision tree

Community annotation decision tree

Tool of choice: WebApollo Web-based Eliminates main drawback of deprecated CAP system - GFF3 format validation

WebApollo example

Community annotation decision tree

Tool of choice: Web forms

Moderation & Storage Gene metadata captured through forms to spreadsheets Batch submissions use similar spreadsheet format

Integration: Dataflow for ‘patch’ build CAP GFF3 WebApollo Reference core Updated geneset TXT Patch Users Stable IDs Reports Updated core IDs Reference core CAP Release core Google Fusion Table Xrefs Release Xrefs Google Form ` Metadata Users } Commit

Presentation of community annotation

Usage (as of ) 31 WebApollo instances (Organisms) 3,407 gene models Gene metadata (protein-coding loci) 4,987 gene symbols 512 gene synonyms 57,878 gene descriptions 910 loci citations from 208 publications

Supplementing annotations Community jamboree’s ‘Standard’ improvement (e.g. Sandfly, snail communities) Glossina community (e.g. March 2015, Kenya) VectorBase Default Xref run includes symbol/description assignment via UniProt Projection of gene description via orthology from key marker species (e.g. An. gambiae). Due to be deployed for June (VB ) release. Supplemental data from genome papers (e.g. 16 Anopheles spp, Musca)

Deprecated CAP system example