EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

GBrowse at TAIR Philippe Lamesch TAIR curator. Seqviewer.
Applications of GO. Goals of Gene Ontology Project.
25th June 2007 Jane Lomax Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
European Bioinformatics Institute The Gene Ontology Annotation (GOA) Database and enhancement of GO annotations through InterPro2GO Nicky Mulder
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
COG and GO tutorial.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Babelomics Functional interpretation of genome-scale experiments Barcelona, 28 November de 2007 Ignacio Medina David Montaner
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis- part 2.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Protein and Function Databases
CACAO - Penn State Gene Function and Gene Ontology January 2011
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Methods for Creating GO Annotations Emily Dimmer European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK.
Lecture 4: Gene Annotation & Gene Ontology June 11, 2015.
Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.
Using The Gene Ontology: Gene Product Annotation.
Introduction to the Gene Ontology and GO annotation resources
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
The Complex Portal - relationship to Gene Ontology Sandra Orchard (IntAct)
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Ontologies, data standards and controlled vocabularies.
EBI is an Outstation of the European Molecular Biology Laboratory. Introduction to the Gene Ontology and GO annotation resources Rachael Huntley UniProtKB-GOA.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
GO: The Gene Ontology Pascale Gaudet dictyBase curator Northwestern University, Chicago, IL.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology Project
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Lecture Four: GO: The Gene Ontology ----Infrastructure for Systems Biology.
BIOINFORMATIK I UEBUNG 2 mRNA processing.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Emily Dimmer GOA group European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK Gene Ontology (GO)
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Part II GO-Vocabulary of Genome. S. cerevisiae D. melanogaster.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
Introduction to the Gene Ontology GO Workshop 3-6 August 2010.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Computer Science Ph. D. Seminar Gene Ontology (GO) Based Search for Protein Structure Similarity Clustering Metrics Ph.D. Candidate Steve Johnson Committee.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Gene Ontology TM (GO) Consortium
Canadian Bioinformatics Workshops
Module 1: Gene Lists 1 Canadian Bioinformatics Workshops
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
Canadian Bioinformatics Workshops
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Gene Annotation & Gene Ontology
Annotating with GO: an overview
Introduction to the Gene Ontology
Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI 25th June 2007 Jane Lomax.
Annotating Gene Products to the GO
Presentation transcript:

EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database European Bioinformatics Institute Cambridge UK

2 EMBRACE Workshop 7-9 th November Reactome E. Coli hub

3 EMBRACE Workshop 7-9 th November 2007 Gene Ontology Annotation (GOA) Database Member of the GO Consortium since 2001 Largest open-source contributor of annotations to GO Provides annotation for more than 139,000 species GOA’s priority is to annotate the human proteome GOA is responsible for human, chicken and bovine annotations in the GO Consortium

4 EMBRACE Workshop 7-9 th November 2007 GOA Group EMBL-EBI Wellcome Trust Genome Campus, Hinxton, Cambridge, UK GOA office

5 EMBRACE Workshop 7-9 th November 2007 Evelyn Camon (senior GOA curator) Daniel Barrell (GOA file releases & database) Rachael Huntley (GOA curator) David Binns (QuickGO, protein2go tools) Along with the help of UniProt curators at the EBI, UniProt controlled vocabularies, HAMAP group, InterPro group, IntAct curators, the IPI group, Ensembl, other EBI groups …and of course the GO editors and the other GO Consortium annotation groups Emily Dimmer (GOA coordinator) GOA Group

6 EMBRACE Workshop 7-9 th November 2007 How does GOA annotate to the GO ?  Electronic Annotation  Manual Annotation Both these methods have their advantages They can be easily distinguished by the evidence code used.

7 EMBRACE Workshop 7-9 th November 2007 Annotations provided to over 140,000 taxa Total of 415,576 PubMed references included as evidence. Manual annotations integrated from external model organism and multi- species databases: AgBase, DictyBase, Ensembl, FlyBase, GDB, GeneDB(S.pombe),Gramene, HGNC, MGI, Reactome, RGD, Roslin, SGD, TAIR, TIGR, WormBase, ZFIN, the IntAct protein-protein interaction database, LIFEdb and the Proteome Inc dataset Status of GOA Annotation Evidence SourceAnnotationsProteinsUniProt coverage Electronic annotations 22,774,6743,362, % Manual Annotations 450,489 86, % October 2007 Stats

8 EMBRACE Workshop 7-9 th November 2007 Core information needed for a GO annotation 1. Gene or gene product identifier e.g. Q9ARH1 2. GO term ID e.g. GO: (protein serine/threonine kinase) 3. Reference ID e.g. PubMed ID: GO_REF: Evidence code e.g. IDA..and also in some cases: - Qualifiers available to modify interpretation of annotation: NOT contributes_to colocalizes_with - ‘With’ column information, to provide further information on the method (evidence code)

9 EMBRACE Workshop 7-9 th November 2007 Electronic Annotation A number of different techniques used by different GO Consortium annotation groups. All resulting annotations must be high-quality and provide an explanation of the method (GO_REF) 1. Mapping of external concepts to GO terms 2. Automatic transfer of annotations to orthologs

10 EMBRACE Workshop 7-9 th November 2007 Electronic annotation: GO mappings Fatty acid biosynthesis (SwissProt keyword) EC: (EC number) IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry) MF_00527: Putative 3- methyladenine DNA glycosylase (HAMAP) Camon et al. BMC Bioinformatics. 2005; 6 Suppl 1:S17 GO:fatty acid biosynthesis (GO: ) GO:DNA repair (GO: ) GO:acetyl-CoA carboxylase activity (GO: ) GO:acetyl-CoA carboxylase activity (GO: )

11 EMBRACE Workshop 7-9 th November 2007

12 EMBRACE Workshop 7-9 th November

13 EMBRACE Workshop 7-9 th November 2007 Automatic transfer of annotations to orthologs Anopheles MouseDrosophilaRatZebrafishXenopus Ensembl COMPARA Homologies between different species calculated GO terms projected from MANUAL annotation only (IDA, IEP, IGI, IMP, IPI) One-to-one and apparent one-to-one orthologies only used. Macaque Chimpanzee Guinea Pig Rat Mouse Dog Chicken Human Rat Human Mouse Human Tetraodon Fugu Zebrafish Aedes aegypti

14 EMBRACE Workshop 7-9 th November 2007 High–quality, specific annotations made using: Peer-reviewed papers A range of evidence codes to categorize the types of evidence found in a paper Very time consuming and requires trained biologists Manual Annotation

15 EMBRACE Workshop 7-9 th November 2007 Finding Annotations In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… wound response serine/threonine kinase activity, integral membrane protein …for B. napus PERK1 protein (Q9ARH1) PubMed ID: FUNCTIONprotein serine/threonine kinase activityGO: COMPONENTintegral to plasma membraneGO: PROCESSresponse to woundingGO:

16 EMBRACE Workshop 7-9 th November 2007 Evidence Codes IEAInferred from Electronic Annotation IDAInferred from Direct Assay IMPInferred from Mutant Phenotype IPIInferred from Protein Interaction IEPInferred from Expression Pattern IGIInferred from Genetic Interaction ISS*Inferred from Sequence or Structural Similarity IGCInferred from Genomic Context RCAReviewed Computational Analysis TASTraceable Author Statement NASNon-traceable Author Statement ICInferred from Curator Judgement NDNo Data available IDA: Enzyme assays In vitro reconstitution Immunofluorescence Cell fractionation TAS: In the literature source the original experiments referred to are referenced.

17 EMBRACE Workshop 7-9 th November 2007 Core information needed for a GO annotation 1. Gene or gene product identifier e.g. Q9ARH1 2. GO term ID e.g. GO: (protein serine/threonine kinase) 3. Reference ID e.g. PubMed ID: GO_REF: Evidence code e.g. IDA..and also in some cases: - Qualifiers available to modify interpretation of annotation NOT contributes_to colocalizes_with - ‘With’ column information, to provide further information on the method (evidence code)

18 EMBRACE Workshop 7-9 th November 2007 The ‘Qualifier’ Column The Qualifier column is used to modify the interpretation of an annotation. Allowable values are: NOT colocalizes_with contributes_to

19 EMBRACE Workshop 7-9 th November 2007 The ‘NOT’ qualifier 'NOT' is used to make an explicit note that the gene product is not associated with the GO term. … particularly important when associating a GO term with a gene product should be avoided (but might otherwise be made, especially by an automated method). Also used to document conflicting claims in the literature. NOT can be used with ALL three GO Ontologies. e.g. This protein does not have ‘kinase activity’ because it has been found that this protein has a disrupted/missing an ‘ATP binding’ domain.

20 EMBRACE Workshop 7-9 th November 2007 The ‘colocalizes_with’ qualifier Only used with GO Component Ontology Gene products that are transiently or peripherally associated with an organelle or complex may be annotated to the relevant cellular component term, using the 'colocalizes_with' qualifier.

21 EMBRACE Workshop 7-9 th November 2007 The ‘contributes_to’ qualifier i.e. annotating 'to the potential of the complex‘ distinguishes an individual subunit from complex functions All gene products annotated using 'contributes_to' must also be annotated to a cellular component term representing the complex that possesses the activity. Only used with GO Function Ontology Where an individual gene product that is part of a complex can be annotated to terms that describe the action (function or process) of the whole complex.

22 EMBRACE Workshop 7-9 th November 2007

23 EMBRACE Workshop 7-9 th November 2007 Where does GOA data go?

24 EMBRACE Workshop 7-9 th November 2007 etc. QuickGO browser: Human Insulin Receptor (P06213)…

25 EMBRACE Workshop 7-9 th November 2007 GO data in Ensembl

26 EMBRACE Workshop 7-9 th November 2007 GOA data in Entrez Gene

27 EMBRACE Workshop 7-9 th November

28 EMBRACE Workshop 7-9 th November 2007 Gene Association Files Tab delimited files: DBDB_Object _ID DB_Object_SymbolQualifier*GO_idDB:RefEvidenceWith* UniProtQ9H2K8TAOK3_HUMANGO: PMID: IDA UniProtO00110O00110_HUMANGO: GO_REF: IEAInterPro:IPR UniProtP09884DPOLA_HUMANNOTGO: PMID: IMP UniProtP09936UCHL1_HUMANGO: PMID: IPIUniProt:P46527 AspectDB_Object_Name*DB_Object_Synonym*DB_Object Type TaxonDateAssigned By FSerine/threonine-protein..IPI proteintaxon: HGNC Fproteintaxon: UniProt PDNA polymerase alpha..IPI proteintaxon: UniProt FUCHL1: Ubiquitin carboxyl..IPI proteintaxon: IntAct * = optional field

29 EMBRACE Workshop 7-9 th November

30 EMBRACE Workshop 7-9 th November 2007 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/

31 EMBRACE Workshop 7-9 th November 2007 Output from the GOA database Non-Redundant based on IPI (International Protein Index) Cow ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/ Redundant 625 proteome sets

32 EMBRACE Workshop 7-9 th November 2007 Output from the GOA database Non-Redundant based on IPI (International Protein Index) Cow ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/ Redundant 625 proteome sets

33 EMBRACE Workshop 7-9 th November 2007 … annotations are also displayed in: All GO Consortium Model Organism Databases integrate and exchange GO annotation data to ensure a comprehensive set of annotations for their organism/area of interest. Array Products and data analysis Affymetrix Spotfire Almac

34 EMBRACE Workshop 7-9 th November 2007 ( … and Numerous Third Party Tools

35 EMBRACE Workshop 7-9 th November 2007 What’s new on the GO annotation front?

36 EMBRACE Workshop 7-9 th November 2007 Reference Genomes Arabidopsis thaliana Caenorhabditis elegans Danio rerio (zebrafish) Dictyostelium discoideum Drosophila melanogaster Escherichia coli Homo sapiens Saccharomyces cerevisiae Mus musculus Schizosaccharomyces pombe Gallus gallus Rattus norvegicus Comprehensive annotation of a set of conserved pathway and disease- related proteins in human and orthologs in 11 other selected genomes Empowers comparative methods used in first pass annotation of other proteomes. E. Coli hub

37 EMBRACE Workshop 7-9 th November 2007 GOA annotation focuses Cardiovascular GO annotation Grant with the British Heart Foundation to support a collaboration with HGNC curators to provide full Gene Ontology annotation to genes associated with cardiovascular processes wiki: Immune GO annotation Interest in actively GO annotating immune relevant genes. GOA, UCL and MGI are collaborating to improve annotation for immunologically-important genes, WT grant pending. wiki:

38 EMBRACE Workshop 7-9 th November 2007 Electronic Annotation developments New mappings: Swiss-Prot Subcellar Location to GO (just released) Swiss-Prot UniPathway Expansion of existing methods Ensembl Compara species expansion

39 EMBRACE Workshop 7-9 th November 2007 Acknowledgements The Gene Ontology Consortium and 1.5 members of GOA currently supported by an P41 grant from the National Human Genome Research Institute (NHGRI) [grant HG002273], GOA is also supported by core EMBL funding and BBSRC Tools and Resources grant. Rolf Apweiler. Head of the EBI protein sequence database group Emily Dimmer Evelyn Camon Rachael Huntley Daniel Barrell David Binns Contact the GOA team: GOA web page: