Genetic Literature Curation at FlyBase-Cambridge Steven Marygold ABC meeting, December 2007 A Database of.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. Co-Chair: Alexander Yeh, MITRE Corp. Data: FlyBase ( July 2002 KDD Cup 2002 Task1:
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
An Information Retrieval and Extraction System for C. elegans Literature.
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Drosophila as a model system Paul Adler Gilmer
Collaboration with IntAct and InterMine: SGD Rama Balakrishnan Saccharomyces Genome Database Gene Ontology Consortium Stanford University, CA USA.
Textpresso Application and Extensibility Eimear Kenny GMOD Meeting, April 2004.
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
GMOD Meeting, May 2005 Patent Pending, Caltech Proprietary Textpresso Search engine for Biomedical Literature ~Eimear Kenny~
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Community Annotation of Gene Function with GONUTS Jim Hu EcoliHub/EcoliWiki Dept. of Biochemistry and Biophysics Texas A&M University.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
CACAO - Penn State Gene Function and Gene Ontology January 2011
Mouse Genome Informatics November 2008 Paul Szauter MGI User Support.
Mendeley What is it? How is it different from other “Bibliographic databases” like End Note and Reference.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Methods for Creating GO Annotations Emily Dimmer European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK.
WormBase: A Resource for the Biology & Genome of C. elegans Lincoln D. Stein.
ISI Web of Knowledge Service for UK Education
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
Using The Gene Ontology: Gene Product Annotation.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Community Curation in FlyBase 10 ways that researchers can help improve FlyBase data.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Curation Editor Flexible web based editor for non gene model data. FlyBase – Harvard University Frank Smutniak.
TAIR Workshop Model Organism Databases and Community Annotation Plant and Animal Genome XVI Conference, San Diego January 13, 2008.
July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist
DONNA MAGLOTT, PH.D. PRO AND MEDICAL GENETICS RESOURCES AT NCBI.
Community Curation Enabling the research community to contribute annotations directly to WormBase Mary Ann Tuli.
Improving Curation Efficiency: User Contributions and Textpresso-Based Semi-Automation SAB 2008 WormBase Literature Curators Textpresso.
The Gene Ontology: a real-life ontology, progress and future. Jane Lomax EMBL-EBI.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Linking Animal Models and Human Diseases Supported by NIH P41 HG002659, U54 HG004028, & R01 HG Cambridge University & the University of Oregon.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
DATA MANAGEMENT AND CURATION AT TAIR
Phenotype Ontology Meeting Cold Spring Harbor November 19-20th, 2005 The FlyBase Consortium: Harvard University University of Bloomington-Indiana University.
PubSearch Danny Yoo, Iris Xu, Behzad Mahini Pub* Tools Website: Literature Curaotors’ Website:
Phenote Mark Gibson Berkeley Bioinformatics and Ontology Project (BBOP) National Center for Biomedical Ontologies(NCBO) Lawrence Berkeley National Lab.
Copyright OpenHelix. No use or reproduction without express written consent1.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
To Boldly GO… Amelia Ireland GO Curator EBI, Hinxton, UK.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
This tutorial will describe how to navigate the section of Gramene that provides descriptions of alleles associated with morphological, developmental,
Introduction to the Gene Ontology GO Workshop 3-6 August 2010.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Chado pub module outline. general (dbxref) cv pub sequence genetic expression map pub module dependencies.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
What’s New in FlyBase EDRC 2015, Heidelberg. Visualising interaction networks.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
Phenote Mark Gibson Berkeley Bioinformatics and Ontology Project (BBOP) National Center for Biomedical Ontologies(NCBO) Lawrence Berkeley National Lab.
Copyright OpenHelix. No use or reproduction without express written consent1.
IMDB: A Generic Insertional Mutagenesis Database Xiaokang Pan and Lincoln Stein Cold Spring Harbor Laboratory.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
CCRC Cancer Conference November 8, 2015.
May 4, What is an allele?. Genotype: genetics of trait (what alleles?) Homozygous: two copies of the same allele –Homozygous dominant (BB) –Homozygous.
NCRI Cancer Conference November 1, 2015.
Annotating with GO: an overview
Introduction to the Gene Ontology
What is Bioinformatics?
Department of Genetics • Stanford University School of Medicine
Typical use case: SEARCH FILTER/REFINE BROWSE REPORT(S)
Annotation: linking literature to gene products
Welcome to the Gene and Allele Database Tutorial
Presentation transcript:

Genetic Literature Curation at FlyBase-Cambridge Steven Marygold ABC meeting, December 2007 A Database of Drosophila Genes & Genomes

Talk Outline 1.Group Structure 2.The FlyBase bibliography 3.Prioritizing curation 4.Curation practice 5.Curation support 6.Future directions

Talk Outline 1.Group Structure 2.The FlyBase bibliography 3.Prioritizing curation 4.Curation practice 5.Curation support 6.Future directions

Group structure FlyBase FB-Indiana - website - fly stocks - image curation FB-Harvard - database - genome annotation - expression curation Group Manager Steven Marygold FB-Cambridge - bibliography - gene and phenotype curation - ontologies Literature Curators 3.25 FTEs GO Curator 1 FTE Reactome Curator 1 FTE Developer 1 FTE FB Ontology Editor 0.25 FTE Principal Investigators Michael Ashburner Nick Brown

Talk Outline 1.Group Structure 2.The FlyBase bibliography 3.Prioritizing curation 4.Curation practice 5.Curation support 6.Future directions

Bibliography Search for string ‘Drosophil*’ in title, abstract or keywords Semi-automated search of publication databases –Medline, BIOSIS, ZooRec Manual searches of journal issues

Talk Outline 1.Group Structure 2.The FlyBase bibliography 3.Prioritizing curation 4.Curation practice 5.Curation support 6.Future directions

Curation prioritization Types of publication curated: –Primary research papers –Supplemental information –Errata –Personal communications to FlyBase –Conference abstracts –Reviews –Books/Book chapters –Miscellaneous others

Curation prioritization 1.Prioritization of selected journals: Set of (~50) journals publishing on Drosophila biology Chronological, issue by issue curation 2.Prioritization of selected papers: Flagged by ‘skim curation’ Flagged by stock center Genes prioritized by GO project Alerted to by research community

Talk Outline 1.Group Structure 2.The FlyBase bibliography 3.Prioritizing curation 4.Curation practice 5.Curation support 6.Future directions

Curation practice Access pdf Identify/select relevant paper Read abstract; skim-read intro Highlight curatable material within Results, Methods, Figures & legends, Tables Curate material into individual ‘proformae’ to form a ‘curation record’ Error-checking: - spelling - consistency - validity Completed records submitted for loading into Chado database

Curation practice Curated data classes (proforma types): –Publication –Gene –Allele –Aberration –Transgenic constructs –Transgenic insertions –Natural transposons

Curation practice Gene-level curated data: –valid FlyBase gene symbol/name –gene symbol/name used in paper –action gene rename or merge –action creation or deletion of gene –etymology of gene name –Sequence Ontology (SO) terms –cytological map position –relationship to cDNA/genomic clone –Gene Ontology (GO) terms –y/n flags to indicate paper has expression or annotation information

Curation practice Allele-level curated data: –valid FlyBase allele symbol/name –allele symbol/name used in paper –action allele rename or merge –action creation or deletion of allele –allele class –mutagen –nucleotide/amino acid changes –phenotype: class, anatomy, free text –genetic interaction: class, anatomy, free text –complementation data –associated transgenic construct/insertion –associated tag

Curation practice ! GENE PROFORMAVersion 50: 05 Oct 2007! ! G1a. Gene symbol to use in database :ey ! G1b. Gene symbol used in reference :ey ! G24a. GO -- Cellular component | evidence [CV] : ! G24b. GO -- Molecular function | evidence [CV] :calcium channel activity ; GO: | IDA ! G24c. GO -- Biological process | evidence [CV] :eye-antennal disc development ; GO: | IMP ! ALLELE PROFORMAVersion 39: 6 July 2007! ! GA1a. Allele symbol to use in database :ey[46] ! GA1b. Allele symbol used in paper :ey[461] ! GA56. Phenotypic | dominance class [bipartite CV] :visible | recessive ! GA17. Phenotype [CV, body part(s) where manifest] :eye anterior vertical bristle

Talk Outline 1.Group Structure 2.The FlyBase bibliography 3.Prioritizing curation 4.Curation practice 5.Curation support 6.Future directions

Curation support Curation support files –Text files of data from latest DB instance Ontology files –GO, SO, FB-anatomy, FB-phenotypes etc. PeeVeS –Proforma Validation Software Other custom scripts

Future directions More paper-by-paper prioritization ‘Skim curation’ –Manual curation –Automated curation? –User-submitted data Use of text-mining aids for ‘deep curation’ Review breadth and depth of curation Enhanced curation interface

Acknowledgements FB-Cambridge: Michael Ashburner (co-PI) Nick Brown (co-PI) Steven Marygold (Manager) Gillian Millburn (Literature curator) David Osumi-Sutherland (Ontology Editor and Literature curator) Ruth Seal (Literature curator) Peter McQuilton (Literature curator) Paul Leyland (Developer) Susan Tweedie (GO curator) Mark Williams (Reactome curator) Rachel Drysdale (former FB-Cambridge co-PI) Genetics Dept., University of Cambridge, UK The FlyBase Consortium NHGRI at the NIH