CACAO Biocurator Training CACAO Fall 2011. CACAO Syllabus What is CACAO & why is it important? Training Examples.

Slides:



Advertisements
Similar presentations
A Comparative mapping resource ONTOLOGY DEVELOPMENT AND INTEGRATION IN GRAMENE Pankaj Jaiswal Cornell University.
Advertisements

Annotation of Gene Function …and how thats useful to you.
Applications of GO. Goals of Gene Ontology Project.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Gene Ontology John Pinney
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Community Annotation of Gene Function with GONUTS Jim Hu EcoliHub/EcoliWiki Dept. of Biochemistry and Biophysics Texas A&M University.
COG and GO tutorial.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Internet tools for genomic analysis: part 2
Protein and Function Databases
BICH CACAO Biocurator Training Session #3.
CACAO - Penn State Gene Function and Gene Ontology January 2011
Gene Ontology at WormBase: Making the Most of GO Annotations Kimberly Van Auken.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Using The Gene Ontology: Gene Product Annotation.
Gene Ontology (GO) Project
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
CACAO training part 1 Jim Hu and Suzi Aleksander For UW Parkside Fall 2014.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Networks and Interactions Boo Virk v1.0.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Gene Product Annotation using the GO ml Harold J Drabkin Senior Scientific Curator The Jackson Laboratory.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
DATA MANAGEMENT AND CURATION AT TAIR
Operated by Los Alamos National Security, LLC for NNSA Bioscience Discovering virulence genes present in novel strains and metagenomes Chris Stubben IC.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
A Common Language for Annotation of Genes from Yeast, Flies and Mice The Gene Ontologies …and Plants and Worms …and Humans …and anything else!
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
Predicting Protein Function Annotation using Protein- Protein Interaction Networks By Tamar Eldad Advisor: Dr. Yanay Ofran Computational Biology.
An example of GO annotation from a primary paper Rebecca E. Foulger (UniProt Curator) GO Annotation Camp, June 2005 PMID:
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
An example of GO annotation from a primary paper GO Annotation Camp, July 2006 PMID:
2/3/2005 Gene Ontology (GO) The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions.
Nitrogen Fixing GO Annotations UW Fall 2013 Example.
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
Extracting Biological Information from Gene Lists
Gene Annotation & Gene Ontology
CACAO Training ASM-JGI 2012.
Annotating with GO: an overview
GO : the Gene Ontology & Functional enrichment analysis
Introduction to the Gene Ontology
Department of Genetics • Stanford University School of Medicine
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
Annotation: linking literature to gene products
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Gene expression analysis
Insight into GO and GOA Angelica Tulipano , INFN Bari CNR
Presentation transcript:

CACAO Biocurator Training CACAO Fall 2011

CACAO Syllabus What is CACAO & why is it important? Training Examples

Mutualistic Relationship We want you to get experience with: 1.CRITICALLY reading scientific papers 2.Bioinformatics resources 3.Collaborating with other biocurators 4.Synthesizing functional annotations We want to get high quality functional annotations to contribute back to the GO Consortium and other biological databases

What is an annotation? Hint: try looking for a definition on Wikipedia.

What is a functional annotation? Process of attaching information from the scientific literature to proteins

Growing need for functional annotations Advances in DNA sequencing mean lots of new genomes & metagenomes

Classic MODel Literature Datasets Curators (rate limiting) Database

Classic MODel is Expensive YIKES!

Growing need for high quality functional annotations High quality annotations allow us to infer the function of genes Which allows us to understand the capabilities of genomes and understand the patterns of gene expression

Two problems meet How can we get more curators with finite budgets? How can we incorporate more critical analysis into undergraduate education?

What does a functional annotation have to do with this course? Process of attaching information from the scientific literature to proteins CACAO will teach you to become a biocurator –you will be adding functional annotations to the biological database GONUTS (

CACAO Community Assessment - How well can Community - you (with our coaching) Annotation with - assign gene functions Ontologies - using GO?

Can students become biocurators? YES! Spring 2010Fall 2010Spring 2011 InstitutionsTAMU UCL TAMU Miami (Ohio) N. Texas Penn State Mich. State Rounds1 round4 rounds5 rounds Annotations* / Submitted 118/153496/753726/ GO annotations in 2 & 1/2 semesters!

Functional annotation with Gene Ontology Controlled vocabulary with –Term identifiers GO: –Name cell cycle checkpoint –Definitions "A point in the eukaryotic cell cycle where progress through the cycle can be halted until conditions are suitable for the cell to proceed to the next stage." [GOC:mah, ISBN: ] –Relationships is_a GO: ! regulation of progression through cell cycle Terms arranged in a Directed Acyclic Graph (DAG)

Why use Ontologies? Standardization facilitate comparison across systems facilitate computer based reasoning systems –Good for data mining! leading functional annotation ontology = Gene Ontology (GO)

What is GO? Who is the GO Consortium (GOC)? GO = ~30,000 terms for gene product attributes 1.Molecular Function (enzyme activity) 2.Biological Process (pathways) 3.Cellular Component (parts of the cell) GO Consortium - set of biological databases that are involved in developing GO and contributing GO annotations

Cellular Component where a gene product acts

Molecular Function activities or “jobs” of a gene product glucose-6-phosphate isomerase activity figure from GO consortium presentations

Biological Process a commonly recognized series of events cell division Figure from Nature Reviews Microbiology 6, (January 2008)

Where can we find GO terms? GONUTS

Search for GO terms on GONUTS

Which subontology (MF, BP or CC) would the following terms fit in? GO: DNA ligase activity GO: Nitrogen compound transport GO: Pseudohyphal growth GO: Acetate transmembrane transporter activity GO: Genetic imprinting GO: Vacuole GO: Plastid small ribosomal subunit

Questions? 1. You will be making functional (GO) annotations using GO terms. 2. You can search for GO terms on GONUTS. What do we know so far?

Where are we adding GO annotations? GONUTS

Why are we using GONUTS? Students can add functional annotations to proteins. It has all the GO terms in it, too. Some of the GO terms have usage notes. It works a lot like Wikipedia, so it’s familiar. It has the ability to keep track of each student’s and team’s annotations. We run it.

REQUIRED parts of a GO annotation GO ** I will cover this again!!

Parts of a GO annotation (cont) Evidence code

Parts of a GO annotation (cont) Reference Notes (about evidence)

Questions? 1. You will be making functional (GO) annotations using GO terms. 2. You can search for GO terms on GONUTS. 3. You will be adding your GO annotations to GONUTS. 4. There are 4 required parts to a GO annotation. 5. You have to base your annotation on an experiment published in a scientific paper. What do we know so far?

Next week Review of GO & GO annotations More biocurator training – lots of examples – lots of practice BICH 485 & 689 students - please stick around to talk about these courses!

Plan for training 1.Synthesizing GO annotations 2.Refinements 3.Judging & Assessment 4.Individual & Team tracking

Part 1: Synthesizing GO annotations

What can you annotate? Proteins. –Any protein with a record in UniProt (Universal Protein Resource - How can you find proteins to annotate? –Think of ways to identify a protein or paper to annotate

Choosing a protein to annotate 1. randomly 2. topics of interest (ie efflux pump proteins, biofilms, marine biology) 3. papers you have come across while doing other stuff 4. methods you know or want to learn 5. phenotypes and mutants you are interested in 6. by author 7. by pathway or regulon 8. suggested by another - high ratio of IEA:manual annotations in GONUTS - mentioned in another class 9. current paper mentions another gene product 10. review papers (ie Annual Reviews are excellent sources) 11. Uniprot, GONUTS, WikiPathways, PubMed searches 12. protein annotated by other teams 13. ask a coach

Search for GO terms on GONUTS

Practice 1. What is the GO term for GO: ? 2. What is the GO identifier for mitosis? 3. How many results (ballpark) do you get when you search for cell division using the Go, Search or G buttons? 4. How many child terms are there for plasma membrane? How many grandchildren? 5. What term is the parent of GO:006825?

Finding a scientific paper on a certain protein Has to be a scientific paper with experimental data in it. –Anything else is a valid reason to challenge! PubMed, PubMed Central, GoogleScholar… No review articles no books, textbooks, wikipedia articles, class notes… You will need the PMID number

Practice - searching PubMed 1.How many papers do you get when you search for “coli”? 2.How many of those papers are reviews? 3.What is the title of the oldest paper when you search for “coli AND RNA polymerase”? 4.How many results are there when you search for “GTPase activity and Gene Ontology”? 5.What is the PMID of the paper when you search for “Hu JC AND coli AND lysR AND 2010”?

Why do we annotate on GONUTS? UniProt (Universal Protein Resource) will not let us annotate protein records on their site. They are a professionally-curated & closed database. GONUTS will. GONUTS pulls the info from the UniProt record when it makes a page for you to edit.

UniProt - UniProt is not community edited, so we can’t add annotations directly to their database Making a protein page on GONUTS requires a UniProt accession

Practice - Searching UniProt Find the UniProt accessions for: a)Mouse Lsr protein b)Diptheria toxin from Corynebacterium c)mutS from E. coli K-12

How do you make a new gene page in GONUTS? 1 2 Use a UniProt accession to make a page on GONUTS that you can add your own annotations to. GoPageMaker will: - Check if the page exists in GONUTS & take you there if it does. - Make a page & pull all of the annotations from UniProt into a table that you can edit.

Practice 1.How many annotations are on the page for the p53 protein from humans? 2.How many different evidence codes are there on the page for the Bub1a protein from mice? 3.Give one of the paper identifiers for an annotation for the LpxK protein from E. coli.

Questions? 1. You will be making functional (GO) annotations using GO terms. 2. You can search for GO terms on GONUTS. 3. You will be adding your GO annotations to GONUTS. 4. There are 4 required parts to a GO annotation. 5. You have to base your annotation on an experiment published in a scientific paper. 6.You can annotate any protein with a record in UniProt. 7.You have to make a page in GONUTS for your protein using the UniProt accession. What do we know so far?

What are evidence codes? Describe the type of work or analysis done by the authors 5 general categories of evidence codes: 1.Experimental 2.Computational 3.Author Statement 4.Curator Assigned 5.Automatically assigned by GO

Describe the type of work or analysis done by the authors 5 general categories of evidence codes: 1.Experimental 2.Computational 3.Author Statement 4.Curator Assigned 5.Automatically assigned by GO CACAO biocurators may only use certain experimental and computational evidence codes What are the evidence codes?

Experimental Evidence Codes IDA: Inferred from Direct Assay IMP: Inferred from Mutant Phenotype IGI: Inferred from Genetic Interaction IEP: Inferred from Expression Pattern IPI: Inferred from Physical Interaction EXP: Inferred from Experiment

Experimental Evidence Codes IDA: Inferred from Direct Assay IMP: Inferred from Mutant Phenotype IGI: Inferred from Genetic Interaction IEP: Inferred from Expression Pattern IPI: Inferred from Physical Interaction EXP: Inferred from Experiment

Computational Evidence Codes ISS: Inferred from Sequence or Structural Similarity ISO: Inferred from Sequence Orthology ISA: Inferred from Sequence Alignment ISM: Inferred from Sequence Model IGC: Inferred from Genomic Context IBA: Inferred from Biological Aspect of Ancestor IBD: Inferred from Biological Aspect of Descendant IKR: Inferred from Key Residues IRD: Inferred from Rapid Divergence RCA: Inferred from Reviewed Computational Analysis

Computational Evidence Codes ISS: Inferred from Sequence or Structural Similarity ISO: Inferred from Sequence Orthology ISA: Inferred from Sequence Alignment ISM: Inferred from Sequence Model IGC: Inferred from Genomic Context IBA: Inferred from Biological Aspect of Ancestor IBD: Inferred from Biological Aspect of Descendant IKR: Inferred from Key Residues IRD: Inferred from Rapid Divergence RCA: Inferred from Reviewed Computational Analysis

Summary of Evidence Codes for CACAO IDA: Inferred from Direct Assay IMP: Inferred from Mutant Phenotype IGI: Inferred from Genetic Interaction IEP: Inferred from Expression Pattern ISO: Inferred from Sequence Orthology ISA: Inferred from Sequence Alignment ISM: Inferred from Sequence Model IGC: Inferred from Genomic Context If it’s not one of these 8, your annotation is incorrect!!!

Required parts (for every annotation) GO: PMID:1111 IDA: Inferred from direct assay Figure 2a

What you might also have to fill in

Questions? 1. You will be making functional (GO) annotations using GO terms. 2. You can search for GO terms on GONUTS. 3. You will be adding your GO annotations to GONUTS. 4. There are 4 required parts to a GO annotation. 5. You have to base your annotation on an experiment published in a scientific paper. 6.You can annotate any protein with a record in UniProt. 7.You have to make a page in GONUTS for your protein using the UniProt accession. What do we know so far?

Practice - Identify the problem annotation(s) & why 1. GO: PMID: IDA: Inferred from Direct Assay Table GO: PMID: IMP: Inferred from Mutant Phenotype Table GO: PMID: IDA: Inferred from Direct Assay 4. GO: PMID: IDA: Inferred from Direct Assay Table GO: PMID: IDA: Inferred from Direct Assay Table GO: PMID: IGI: Inferred from Genetic Interaction Table GO: IDA: Inferred from Direct Assay Table GO: PMID: EXP: Inferred from Experiment Table What is the UniProt accession of the protein described/annotated? GO IDReferenceEvidence CodeNotes

How is CACAO scored? Points for a complete annotation GO term (right level of specificity) Reference (paper) Evidence code Identify where in the paper the evidence is Refinements used to steal points for incorrect &/or incomplete annotations Identify a problem Suggest correct alternative Refinements can be entered by any team (including the original team)

How can you get the annotations required by Rubric #2? 1.Synthesize complete & correct annotations. 2.Correctly refine (challenge & correct) someone else’s annotation. 3.If your annotation gets challenged, offer the best correction.

Summary You will be searching literature for experimental evidence for a protein’s function (MF), processes (BP) and location (CC)

Where do annotations show up?

Refinements & Challenges

What can you challenge?

Scoreboard

Schedule

Spring Results by organism