CACAO Training Jim Hu and Suzi Aleksander Fall 2015.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
© Wiley Publishing All Rights Reserved. How Most People Use Bioinformatics.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Gene Ontology John Pinney
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Community Annotation of Gene Function with GONUTS Jim Hu EcoliHub/EcoliWiki Dept. of Biochemistry and Biophysics Texas A&M University.
How to use the web for bioinformatics Molecular Technologies October 14, 2006 Ethan Strauss X 1171
COG and GO tutorial.
CACAO Biocurator Training CACAO Fall CACAO Syllabus What is CACAO & why is it important? Training Examples.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Protein and Function Databases
BICH CACAO Biocurator Training Session #3.
CACAO - Penn State Gene Function and Gene Ontology January 2011
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Gene Ontology at WormBase: Making the Most of GO Annotations Kimberly Van Auken.
Hudson Valley Community College Marvin Library GOOGLE SCHOLAR
An introduction to using the AmiGO Gene Ontology tool.
Using The Gene Ontology: Gene Product Annotation.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Introduction to Gene Mining Part B: How similar are plant and human versions of a gene? After completing part B, you will demonstrate How to use NCBI BLASTp.
CACAO training part 1 Jim Hu and Suzi Aleksander For UW Parkside Fall 2014.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Copyright OpenHelix. No use or reproduction without express written consent1.
English 115 GoogleScholar/ OneSearch Hudson Valley Community College Marvin Library Learning Commons 1.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
Christopher Cox, Dean of Libraries.  Don’t understand why they are written  Don’t recognize the purpose of the literature review  Don’t associate the.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
Gene Product Annotation using the GO ml Harold J Drabkin Senior Scientific Curator The Jackson Laboratory.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Operated by Los Alamos National Security, LLC for NNSA Bioscience Discovering virulence genes present in novel strains and metagenomes Chris Stubben IC.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Bioinformatics and Computational Biology
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
An example of GO annotation from a primary paper Rebecca E. Foulger (UniProt Curator) GO Annotation Camp, June 2005 PMID:
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
An example of GO annotation from a primary paper GO Annotation Camp, July 2006 PMID:
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
An annotated bibliography is a brief summary and evaluation of sources.
Nitrogen Fixing GO Annotations UW Fall 2013 Example.
Finding Scholarly Articles in a Library Database
CACAO Training ASM-JGI 2012.
GO : the Gene Ontology & Functional enrichment analysis
Introduction to the Gene Ontology
Department of Genetics • Stanford University School of Medicine
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Presentation transcript:

CACAO Training Jim Hu and Suzi Aleksander Fall 2015

Outline Part 1: Gene Ontology and functional annotation –How known functions are used to reveal new knowledge –Gene Ontology –What is an annotation? –CACAO Part 2: Making annotations and challenges

L EVERAGING WHAT WE KNOW ABOUT FUNCTION

Leveraging What We Know About Function Genomics has helped reveal the complexity of biological systems We want to apply what we already know about gene function to new problems/systems We have a massive amount of published knowledge –Almost useless by itself!!

Making our knowledge reusable We need –Knowledge that computers can analyze –Common vocabulary across different organisms –Disambiguation of synonyms –Connection of related ideas that are more or less specific Examples: –polygon – quadrilateral – rectangle – square –Enzyme – kinase – protein kinase – protein tyrosine kinase

G ENE O NTOLOGY

Gene Ontology (GO) terms ID numbers Definitions Relationships –Directed Acyclic Graph GO terms provide a way to describe functions, now we have to associate them with genes! –AKA GO annotation 3 aspects

Molecular Function activities = what a protein can do by itself GO: hexokinase activity From PMID: , rndsystems.com GO: Kinase activity

Biological Process a commonly recognized series of events –Including, but not just biochemical pathways GO: cell division From ridge.icu.ac.jp, GO: transcription, DNA dependent

Cellular Component where a gene product acts –Subcellular location –Multicomponent complex From visualphotos.com, epmm.group.shef.ac.uk, wikimedia commons GO: mitochondrion GO: peptidoglycan-based cell wall GO: ribosome

GO A NNOTATION

What is Annotation?

Levels of Annotation for Genomes Metadata –What is this genome? Features –Where are things in the sequence? Products –What do we know about the features? Systems –What do the products do? individually? Working together? L. Stein (2001) Nature Reviews Genetics 2:493

Functional Annotation w/GO Annotation: information added to another source GO Annotation: –An assertion about the normal function of a gene product –a database entry in a specific format that associates a GO term with a gene product made based on evidence in a peer-reviewed paper Specific format makes the annotations readable by both computers and humans GO annotations capture the chain of evidence for how functions were inferred from experiments

Where Do Annotations Come From? Literature Datasets Biocurators (rate limiting) Database

>24 million peer-reviewed articles in PubMed Many millions of proteins recorded in UniProt Databases Need Help!

CACAO

What is CACAO? Community Assessment of Community Annotation with Ontologies (CACAO) –Annotation of gene function –Competition Within a class Between teams at different schools –Real annotation As of Sept 2015, CACAO students have contributed 3,801 annotations to 2,616 protein records (Source: QuickGO)

How Does CACAO Work? Working in teams on the GONUTS website: – Multiple innings –Annotation week: you make annotations on the website to get points –Challenge week: you challenge annotations made by other teams to steal their points –Open week (sometimes): You can challenge AND annotate in the same week You can make as many annotations as you want. –You pick the topic –You have to convince us that they are correct. The default is that they are wrong!!

Functional Annotation w/GO Annotation: information added to another source GO Annotation: –An assertion about the normal function of a gene product –a database entry in a specific format that associates a GO term with a gene product made based on evidence in a peer-reviewed paper Specific format makes the annotations readable by both computers and humans GO annotations capture the chain of evidence for how functions were inferred from experiments

What To Annotate You can start with a paper –Find the proteins discussed, potential GO terms You can start with a GO term –Modify PubMed search with keywords or organism You can start with a protein –Find papers about the protein Don’t get stuck on what you started with –Your first paper may not have experiments about function (Reviews) –Reading about your initial protein may lead you to better information about other proteins

Functional Annotation w/GO Annotation: a note that is made while reading any form of text GO Annotation: –An assertion about the normal function of a gene product –a database entry in a specific format that associates a GO term with a gene product made based on evidence in a peer-reviewed paper

Starting with a paper Need a scientific paper with experimental data –Use PubMed: Or use an alias like –No review articles, no books, no textbooks, no Wikipedia articles, no class notes… –BUT it is GOOD to start with those! –DON’T start with the first paper you see from a random PubMed search

Starting with a paper Need a scientific paper with experimental data –PubMed review? –We refer to the paper through the PMID number Not the full citation

PubMed Record

Getting the Full Text The abstract is not enough for an annotation But, may be enough to reject a paper!!!

Getting the Full Text Some papers are open access –Pubmed Central –Journal sites Others are pay only –Don’t pay real $$! Your library may have subscriptions Pick a different paper the author and ask for a pdf –Send us a copy

Alternative Path: Start w/Full Text

Beware! Good science ≠ good for annotation

Beware! Good science ≠ good for annotation

Beware! Good science ≠ good for annotation

Beware! Good science ≠ good for annotation

Functional Annotation w/GO Annotation: a note that is made while reading any form of text GO Annotation: –An assertion about the normal function of a gene product –a database entry in a specific format that associates a GO term with a gene product made based on evidence in a peer-reviewed paper

Finding Proteins Search UniProt for something interesting Look in UniProt for the protein(s) in the paper you are reading. No matter what, you will need to find the protein’s accession on UniProt ( Use that accession to make a page for that protein on GONUTS ( Add your GO annotations to the protein’s page on GONUTS

UniProt ( If you have a paper, look for an accession –UniProt accession –NCBI Gene ID If you don’t have an accession, search by name/keyword

UniProt Search Results Multiple entries –Find the right one –Icons Gold = Swissprot = reviewed Plain = TrEMBL = automated

UniProt Records Lots of information to help you –Summary of existing GO annotations Link to QuickGO for complete set of existing annotations –Information about the protein

Make Sure You Have the Right Protein Right species/strain Not a fragment Sometimes UniProt has multiple entries for the same protein –Gold star = SwissProt = reviewed –Blank star = TrEMBL = computational entry Sometimes the protein you want is not in UniProt –May want to find another paper/protein Ask for help –OK to the UniProt help desk –check your reasoning with us!

Create a Protein Page in GONUTS

Entering/Editing Annotations

Functional Annotation w/GO Annotation: a note that is made while reading any form of text GO Annotation: –An assertion about the normal function of a gene product –a database entry in a specific format that associates a GO term with a gene product made based on evidence in a peer-reviewed paper

GONUTS: QuickGO: AmiGO2: Finding GO Terms

Strategies Search for a keyword and browse the ontology for the right term –In GONUTS only search “Category” namespace if you get too many hits –Look at the parents, children, and relatives –Use Google, Wikipedia etc. to find alternative search terms Look at terms suggested by others for your protein –Computational with the IEA evidence code –Curators with TAS or IC Look at terms used for homologous proteins in model organisms

Functional Annotation w/GO Annotation: a note that is made while reading any form of text GO Annotation: –An assertion about the normal function of a gene product –a database entry in a specific format that associates a GO term with a gene product made based on evidence in a peer-reviewed paper

Evidence Codes for CACAO Evidence codes describe the type of work or analysis done by the authors –IDA: Inferred from Direct Assay –IMP: Inferred from Mutant Phenotype NOT just for mutations! Includes inferred from inhibition in vivo by drugs, RNAi, etc. –IGI: Inferred from Genetic Interaction –ISO: Inferred from Sequence Orthology –ISA: Inferred from Sequence Alignment –ISM: Inferred from Sequence Model –IGC: Inferred from Genomic Context Expert biocurators get to use others, but we restrict them for CACAO. If it’s not one of these 7, your annotation is incorrect!!!

Decision Tree to Choose Evidence

Evidence Pull-Down Menu

Some Evidence Types Require More Information With/from Evidence from sequence comparison –With the protein accession for the protein you are comparing to That protein must have experimental annotation to the same GO term Evidence from computational analysis –With the reference for the analysis tool Evidence from genetic interaction –With the other gene(s) your protein is interacting with

Evidence Codes for CACAO Picking the right evidence code is important Use the evidence code decision tree – e.pdfhttp://gowiki.tamu.edu/wiki/images/3/32/CACAO_decisiontre e.pdf Use the evidence code guidelines at the GO consortium website: – Discuss!

Note Required for CACAO

How Does CACAO Work? Getting help is not cheating! –Talk to your teammates –Ask us questions –Talk to other professors – authors of papers

Example Papers Powerpoints and pdfs for different examples can be found at: O_training O_training –Component annotation based on immunofluorescence in B. subtilis –Function annotation based on a study of E. coli Topoisomerase IV –More coming…

Possible homework Find annotations made by professional biocurators –Read the paper –Discuss what you would put in the note Examine past CACAO annotations –Discuss whether the annotations make sense Pick some papers that you might annotate –In class discussion of how you made the choice and why you think it works for GO annotation