Presentation is loading. Please wait.

Presentation is loading. Please wait.

Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Similar presentations


Presentation on theme: "Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry."— Presentation transcript:

1 Importing Community annotations into VectorBase

2 Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry requirements, be scaleable and (relatively) simple to use

3 Genome annotation First-pass genome annotation is almost always based on “automatic” computational approaches ab initio Similarity based Transcript (ESTs, RNAseq) Protein (nr protein database)

4 Genome assembly Map Repeats Genefinding Protein-coding genes Map Transcripts Map Peptides nc-RNAs Functional annotation Submission to archival databases (Release) Genome annotation - building a pipeline

5 Current VectorBase annotation pipeline MAKER based automatic annotation includes SNAP training and ab initio RNAseq based transcript similarity prediction Taxonomically constrained peptide similarity prediction 2 rounds of prediction refinement & final round includes all peptide similarity Community annotation phase Capture gene structure changes Metadata associated with locus (symbol, description, citation) Submission to INSDC, propagation to UniProt Presentation through VectorBase Start 1.0 set (automatic) 1.1 set (published)

6 Processing submissions 4 phases Capture Moderation Storage Integration

7 Capture: Community annotation decision tree

8 Community annotation decision tree

9 Tool of choice: WebApollo Web-based Eliminates main drawback of deprecated CAP system - GFF3 format validation

10 WebApollo example

11 Community annotation decision tree

12

13 Tool of choice: Web forms

14 Moderation & Storage Gene metadata captured through forms to spreadsheets Batch submissions use similar spreadsheet format

15 Integration: Dataflow for ‘patch’ build CAP GFF3 WebApollo Reference core Updated geneset TXT Patch Users Stable IDs Reports Updated core IDs Reference core CAP Release core Google Fusion Table Xrefs Release Xrefs Google Form ` Metadata Users } Commit

16 Presentation of community annotation

17 Usage (as of 2015-03-30) 31 WebApollo instances (Organisms) 3,407 gene models Gene metadata (protein-coding loci) 4,987 gene symbols 512 gene synonyms 57,878 gene descriptions 910 loci citations from 208 publications

18 Supplementing annotations Community jamboree’s ‘Standard’ improvement (e.g. Sandfly, snail communities) Glossina community (e.g. March 2015, Kenya) VectorBase Default Xref run includes symbol/description assignment via UniProt Projection of gene description via orthology from key marker species (e.g. An. gambiae). Due to be deployed for June (VB-2015-06) release. Supplemental data from genome papers (e.g. 16 Anopheles spp, Musca)

19

20 Deprecated CAP system example


Download ppt "Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry."

Similar presentations


Ads by Google