Presentation is loading. Please wait.

Presentation is loading. Please wait.

VectorBase BRC4 20061 VectorBase annotation metrics Daniel Lawson VectorBase-EBI, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton.

Similar presentations


Presentation on theme: "VectorBase BRC4 20061 VectorBase annotation metrics Daniel Lawson VectorBase-EBI, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton."— Presentation transcript:

1 VectorBase BRC4 20061 VectorBase annotation metrics Daniel Lawson VectorBase-EBI, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton UK

2 VectorBase BRC4 20062 Topics Annotation metrics –Numbers (Gene numbers & xrefs) –Data types (Availability & Integration) Annotation SOPs –Genome specific –Gene specific –Gene build profile & prediction confidence

3 VectorBase BRC4 20063 AaegL1.1AgamP3.3YeastWormFlyHuman Gene Gene count16,69113,7657,09821,10514,75231,206 Protein-coding15,419 (92.4 %)13,277 (96.5 %)6,68020,06014,08623,245 other 1,272 ( 7.6 %)488 (3.5 %)4181,0456667,961 Transcript Transcript count18,06114,127---- Protein-coding16,789 (93.0 %)13,639 (96.5 %)---- other1,272 (7.0 %)488 (3.5 %)---- Manual effort Manually reviewed0 (0.0 %)261 (1.9 %)6,68020,06014,0866,995 Community input0 (0.0 %)667 (4.9 %)4,6847,2289,94516,887 Orthologs Combined11,487 (74.5)9,782 (73.7 %)---- A.aegyptin/a8,907 (67.1 %)2,2024,4167,9916,590 A.gambiae9,923 (54.9 %)n/a2,2284,4447,7026,612 C.elegans4,923 (29.5 %)4,442 (33.4 %)2,185n/a4,5986,121 D.melanogaster9,078 (50.3 %)7,649 (57.6 %)2,2284,543n/a6,654 H.sapiens5,510 (33.0 %)5,046 (38.0 %)2,3264,4735,109n/a S.cerevisiae2,520 (15.1 %)2,350 (17.7 %)n/a2,3492,4703,265 Functional annotation GO terms9,335 (51.7 %)7,601 (55.7 %)4,17611,33410,22617,000 EC numbers2,950 (16.3 %)2,230 (16.4 %)4,103 *5,240 *4,009 *13,245 * InterPro11,536 (74.8 %)9,869 (72.4 %)4,61114,73010,47518,199 Expression evidence Combined12,350 (80.0 %)7,557 (55.4 %)---- cDNA/EST9,270 (60.1 %)7,557 (55.4 %)---- microarray9,143 (59.2 %)†0 (0.0 %)‡---- MPSS3,984 (25.8 %)†n/a----

4 VectorBase BRC4 20064 Considerations Importance of calculating all metrics using similar methodology from the same data set Metrics calculated from Ensembl using BioMart & raw SQL queries. GO terms - many ways of calculating (InterPro2GO, projection from Drosophila orthologs) No VectorBase capability to automatically assign EC numbers

5 VectorBase BRC4 20065 AaegL1.1AgamP3.3 SequenceYesDownload, search, visualizationYesDownload, search, visualization PolymorphismsNon/aYesSearch, visualization Genetic mapsYesNot integratedYesVisualization Syntenic alignmentYesVisualizationYesVisualization cDNAs & ESTsYesDownload, search, visualizationYesDownload, search, visualization SAGE tagsNon/aNon/a MicroarraysYesVisualizationYesVisualization MPSSYesNot integratedNon/a ProteomicsNon/aNon/a StructuresNon/aNon/a Interactome dataNon/aNon/a PathwaysNon/aNon/a Orthology profilesYesVisualizationYesVisualization Essentiality dataNon/aNon/a

6 VectorBase BRC4 20066 VectorBase gene prediction pipeline (SOP) Blessed predictions Community submissionsManual annotations Species-specific predictions Similarity predictions Transcript based predictions Ab initio gene predictions Canonical Gene set VB:SOP001 VB:SOP002 & SOP003 VB:SOP005 VB:SOP004 Protein family HMMs VB:SOP009 ncRNA predictions VB:SOP008 VB:SOP007 VB:SOP010

7 VectorBase BRC4 20067 Assignment of SOPs to VectorBase genes: AgamP3.3 SOPNo. genes VB:SOP001Confirmed674 VB:SOP002Protein-based with transcript support 3765 VB:SOP003Protein-based4830 VB:SOP004Transcript-based2857 VB:SOP005Supported ab initio585 VB:SOP006ab initio0 VB:SOP007Manual annotation928

8 VectorBase BRC4 20068 Display of Metrics & SOPs Metrics –VectorBase wiki –Species-page containing the three tables available from the VectorBase species homepage –Expansion of documents relating to genomic resources (citations, links to primary data where possible) –Single collated table for BRC as separate download SOPs –VectorBase wiki –‘Documents’ section of main site

9 VectorBase BRC4 20069

10 10 Manual annotation progress Protein-coding gene No. VectorBase manual Community submission Anopheles gambiae AgamP3.313,277261 ( 2.0 %)667 ( 5.0 %) current2474 (18.6 %)667* ( 5.0 %) Aedes aegypti AaegL1.115,4190 ( 0.0 %) current0 ( 0.0 %)341 ( 2.2 %)

11 VectorBase BRC4 200611 Merging gene sets Reduce to single predictions per locus Compare exon/intron structures Gene set #1Gene set #2 Identical structures Compatible structures Different structures Merge/Split structures ComplexNo Map Add isoform predictions based on EST/Peptide data Canonical gene set


Download ppt "VectorBase BRC4 20061 VectorBase annotation metrics Daniel Lawson VectorBase-EBI, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton."

Similar presentations


Ads by Google