Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.

Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010

Introduction to GO 1. Annotation 2. Bio-ontologies 3. the Gene Ontology (GO) a GO annotation example GO evidence codes literature biocuration & computation analysis ND vs no GO sources of GO 4. Using the GO

Genomic Annotation  Genome annotation is the process of attaching biological information to genomic sequences. It consists of two main steps: 1. identifying functional elements in the genome: “structural annotation” 2. attaching biological information to these elements: “functional annotation”  biologists often use the term “annotation” when they are referring only to structural annotation

CHICK_OLF6 DNA annotation Protein annotation Data from Ensembl Genome browser TRAF 1, 2 and 3TRAF 1 and 2 Structural annotation:

catenin Functional annotation:

Structural & Functional Annotation Structural Annotation:  Open reading frames (ORFs) predicted during genome assembly  predicted ORFs require experimental confirmation  the Sequence Ontology (SO) provides a structured controlled vocabulary for sequence annotation Functional Annotation:  annotation of gene products = Gene Ontology (GO) annotation  initially, predicted ORFs have no functional literature and GO annotation relies on computational methods (rapid)  functional literature exists for many genes/proteins prior to genome sequencing  GO annotation does not rely on a completed genome sequence!

1. Provides structural annotation for agriculturally important genomes 2. Provides functional annotation (GO) 3. Provides tools for functional modeling 4. Provides bioinformatics & modeling support for research community Avian Gene Nomenclature

1. Bio-ontologies

Bio-ontologies  Bio-ontologies are used to capture biological information in a way that can be read by both humans and computers. necessary for high-throughput “omics” datasets allows data sharing across databases  Objects in an ontology (eg. genes, cell types, tissue types, stages of development) are well defined.  The ontology shows how the objects relate to each other.

Bio-ontologies: http://www.obofoundry.org/

Ontologies digital identifier (computers) description (humans) relationships between terms

2. The Gene Ontology

What is the Gene Ontology?  assign functions to gene products at different levels, depending on how much is known about a gene product  is used for a diverse range of species  structured to be queried at different levels, eg: find all the chicken gene products in the genome that are involved in signal transduction zoom in on all the receptor tyrosine kinases  human readable GO function has a digital tag to allow computational analysis of large datasets COMPUTATIONALLY AMENABLE ENCYCLOPEDIA OF GENE FUNCTIONS AND THEIR RELATIONSHIPS “a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing”

GO annotation example NDUFAB1 (UniProt P52505) Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa Biological Process (BP or P) GO:0006633 fatty acid biosynthetic process TAS GO:0006120 mitochondrial electron transport, NADH to ubiquinone TAS GO:0008610 lipid biosynthetic process IEA Cellular Component (CC or C) GO:0005759 mitochondrial matrix IDA GO:0005747 mitochondrial respiratory chain complex I IDA GO:0005739 mitochondrion IEA NDUFAB1 Molecular Function (MF or F) GO:0005504 fatty acid binding IDA GO:0008137 NADH dehydrogenase (ubiquinone) activity TAS GO:0016491 oxidoreductase activity TAS GO:0000036 acyl carrier activity IEA

GO annotation example NDUFAB1 (UniProt P52505) Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa aspect or ontology GO:ID (unique) GO term name GO evidence code

GO EVIDENCE CODES Direct Evidence Codes IDA - inferred from direct assay IEP - inferred from expression pattern IGI - inferred from genetic interaction IMP - inferred from mutant phenotype IPI - inferred from physical interaction Indirect Evidence Codes inferred from literature IGC - inferred from genomic context TAS - traceable author statement NAS - non-traceable author statement IC - inferred by curator inferred by sequence analysis RCA - inferred from reviewed computational analysis IS* - inferred from sequence* IEA - inferred from electronic annotation Other NR - not recorded (historical) ND - no biological data available ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model Guide to GO Evidence Codes http://www.geneontol ogy.org/GO.evidence.s html

GO Mapping Example NDUFAB1 GO EVIDENCE CODES Direct Evidence Codes IDA - inferred from direct assay IEP - inferred from expression pattern IGI - inferred from genetic interaction IMP - inferred from mutant phenotype IPI - inferred from physical interaction Indirect Evidence Codes inferred from literature IGC - inferred from genomic context TAS - traceable author statement NAS - non-traceable author statement IC - inferred by curator inferred by sequence analysis RCA - inferred from reviewed computational analysis IS* - inferred from sequence* IEA - inferred from electronic annotation Other NR - not recorded (historical) ND - no biological data available Biocuration of literature detailed function “depth” slower (manual)

P05147 PMID: 2976880 Find a paper about the protein. Biocuration of Literature: detailed gene function

Read paper to get experimental evidence of function Use most specific term possible experiment assayed kinase activity: use IDA evidence code

GO Mapping Example NDUFAB1 GO EVIDENCE CODES Direct Evidence Codes IDA - inferred from direct assay IEP - inferred from expression pattern IGI - inferred from genetic interaction IMP - inferred from mutant phenotype IPI - inferred from physical interaction Indirect Evidence Codes inferred from literature IGC - inferred from genomic context TAS - traceable author statement NAS - non-traceable author statement IC - inferred by curator inferred by sequence analysis RCA - inferred from reviewed computational analysis IS* - inferred from sequence* IEA - inferred from electronic annotation Other NR - not recorded (historical) ND - no biological data available ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model Biocuration of literature detailed function “depth” slower (manual) Sequence analysis rapid (computational) “breadth” of coverage less detailed

Unknown Function vs No GO  ND – no data Biocurators have tried to add GO but there is no functional data available Previously: “process_unknown”, “function_unknown”, “component_unknown” Now: “biological process”, “molecular function”, “cellular component”  No annotations (including no “ND”): biocurators have not annotated this is important for your dataset: what % has GO?

1. Primary sources of GO: from the GO Consortium (GOC) & GOC members most up to date most comprehensive 2. Secondary sources: other resources that use GO provided by GOC members public databases (eg. NCBI, UniProtKB) genome browsers (eg. Ensembl) array vendors (eg. Affymetrix) GO expression analysis tools Sources of GO

 Different tools and databases display the GO annotations differently.  Since GO terms are continually changing and GO annotations are continually added, need to know when GO annotations were last updated. Sources of GO annotation

 EXAMPLES: public databases (eg. NCBI, UniProtKB) genome browsers (eg. Ensembl) array vendors (eg. Affymetrix)  CONSIDERATIONS: What is the original source? When was it last updated? Are evidence codes displayed? Secondary Sources of GO annotation

For more information about GO  GO Evidence Codes: http://www.geneontology.org/GO.evidence.shtml  gene association file information: http://www.geneontology.org/GO.format.annotation.shtml  tools that use the GO: http://www.geneontology.org/GO.tools.shtml  GO Consortium wiki: http://wiki.geneontology.org/index.php/Main_Page All websites are listed on the AgBase workshop website.

3. Using the GO

http://www.geneontology.org/

However….  many of these tools do not support non-model organisms  the tools have different computing requirements  may be difficult to determine how up-to-date the GO annotations are… Need to evaluate tools for your system.

Some useful expression analysis tools:  Database for Annotation, Visualization and Integrated Discovery (DAVID) http://david.abcc.ncifcrf.gov/  agriGO -- GO Analysis Toolkit and Database for Agricultural Community http://bioinfo.cau.edu.cn/agriGO/ used to be EasyGO chicken, cow, pig, mouse, cereals, dicots includes Plant Ontology (PO) analysis  Onto-Express http://vortex.cs.wayne.edu/projects.htm#Onto-Express can provide your own gene association file  Funcassociate 2.0: The Gene Set Functionator http://llama.med.harvard.edu/funcassociate/ can provide your own gene association file

Evaluating GO tools Some criteria for evaluating GO Tools: 1. Does it include my species of interest (or do I have to “humanize” my list)? 2. What does it require to set up (computer usage/online) 3. What was the source for the GO (primary or secondary) and when was it last updated? 4. Does it report the GO evidence codes (and is IEA included)? 5. Does it report which of my gene products has no GO? 6. Does it report both over/under represented GO groups and how does it evaluate this? 7. Does it allow me to add my own GO annotations? 8. Does it represent my results in a way that facilitates discovery?

Functional Modeling Considerations  Should I add my own GO? use GOProfiler to see how much GO is available for your species use GORetriever to find existing GO for your dataset Does analysis tool allow me to add my own GO?  Should I do GO analysis and pathway analysis and network analysis? different functional modeling methods show different aspects about your data (complementary) is this type of data available for your species (or a close ortholog)?  What tools should I use? which tools have data for your species of interest? what type of accessions are accepted? availability (commercial and freely available)

Protein/Gene identifiers GORetriever GO annotations Genes/Proteins with no GO annotations GOanna Pathways and network analysis GO Enrichment analysis ArrayIDer Microarray Ids GOSlimViewer Yellow boxes represent AgBase tools Green/Purple boxes are non-AgBase resources Ingenuity Pathways Analysis (IPA) Pathway Studio Cytoscape DAVID Ingenuity Pathways Analysis (IPA) Pathway Studio Cytoscape DAVID EasyGO Onto-Express Onto-Express-to-go (OE2GO) Overview of functional modeling strategy

All workshop materials are available at AgBase.

Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.

Similar presentations

Presentation on theme: "Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.

Similar presentations

Presentation on theme: "Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010."— Presentation transcript:

Similar presentations

About project

Feedback