Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Applications of GO. Goals of Gene Ontology Project.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Gene Ontology John Pinney
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
COG and GO tutorial.
CACAO Biocurator Training CACAO Fall CACAO Syllabus What is CACAO & why is it important? Training Examples.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Internet tools for genomic analysis: part 2
Comprehensive Annotation System for Infectious Disease Data Alexander Diehl University at Buffalo/The Jackson Laboratory IDO Workshop /9/2010.
Protein and Function Databases
BICH CACAO Biocurator Training Session #3.
Gene Ontology at WormBase: Making the Most of GO Annotations Kimberly Van Auken.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
SPH 247 Statistical Analysis of Laboratory Data 1 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Using The Gene Ontology: Gene Product Annotation.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
SPH 247 Statistical Analysis of Laboratory Data 1May 14, 2013SPH 247 Statistical Analysis of Laboratory Data.
March 24, Integrating genomic knowledge sources through an anatomy ontology Gennari JH, Silberfein A, and Wiley JC Pac Symp Biocomputing 2005:
I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene expression analysis
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Gene Product Annotation using the GO ml Harold J Drabkin Senior Scientific Curator The Jackson Laboratory.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Protein and RNA Families
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Operated by Los Alamos National Security, LLC for NNSA Bioscience Discovering virulence genes present in novel strains and metagenomes Chris Stubben IC.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
A Common Language for Annotation of Genes from Yeast, Flies and Mice The Gene Ontologies …and Plants and Worms …and Humans …and anything else!
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Statistical Testing with Genes Saurabh Sinha CS 466.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
InterPro Sandra Orchard.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
An example of GO annotation from a primary paper GO Annotation Camp, July 2006 PMID:
2/3/2005 Gene Ontology (GO) The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Gene Annotation & Gene Ontology
Annotating with GO: an overview
Introduction to the Gene Ontology
Department of Genetics • Stanford University School of Medicine
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
Genome Annotation Continued
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Gene expression analysis
Insight into GO and GOA Angelica Tulipano , INFN Bari CNR
Presentation transcript:

Functional Annotation and Functional Enrichment

Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory elements, functional RNAs, etc). – Ab initio – computationally predicted – Comparative – based on similarity to other genes or genomes – Experimental – transcript sequencing Functional Annotation – attaching meaning to the features (names, product, activity, biological role, etc.) – Sequence homology – Structural similarity or structural features – Experimental data – gene or protein expression patterns

Functional Annotation Manual Slow Costly Inconsistent quality Inconsistent coverage across genome Rich content Error correction Automated Fast Cheap? Consistent quality Complete coverage across genome Improving in content Updateable

Home many ways are there to say the same thing? Quick survey of GenBank lacI product annotations in 48 bacteria: – Lactose operon repressor (20) – DNA-binding transcriptional repressor (14) – transcriptional regulator LacI family (5) – lac operon repressor (2) – transcriptional repressor of the lac operon (2) – lac repressor (1) – LacI (1) – putative transcriptional regulator (1) – transcriptional repressor of lactose catabolism (1) – transcriptional repressor of lactose catabolism (GalR/LacI family) (1) * Excluding differences in capitalization

The Gene Ontology (GO) Goal = consistent annotation of gene products within and between organisms Gene Ontology Consortium began as a collaboration among model organism dbs (FlyBase, SGD, MGD). Now includes larger number of members and interest groups Ontology = A formal representation of concepts and the relationships among them

Gene Ontology

The 3 GO Ontologies Molecular Function (8,360 terms) Biological Process (14,898 terms) Cellular Component (2,110 terms) GO Term = an entry in an ontology, composed of a unique identifier (GO:000001), definition and “synoynms”

CC A cellular component is just that, a component of a cell, but with the proviso that it is part of some larger object; this may be an anatomical structure (e.g. rough endoplasmic reticulum or nucleus) or a gene product group (e.g. ribosome, proteasome or a protein dimer).

BP A biological process is series of events accomplished by one or more ordered assemblies of molecular functions. Examples of broad biological process terms are cellular physiological process or signal transduction. Examples of more specific terms are pyrimidine metabolic process or alpha-glucoside transport. It can be difficult to distinguish between a biological process and a molecular function, but the general rule is that a process must have more than one distinct steps. A biological process is not equivalent to a pathway; at present, GO does not try to represent the dynamics or dependencies that would be required to fully describe a pathway.

MF Molecular function describes activities, such as catalytic or binding activities, that occur at the molecular level. GO molecular function terms represent activities rather than the entities (molecules or complexes) that perform the actions, and do not specify where or when, or in what context, the action takes place. Molecular functions generally correspond to activities that can be performed by individual gene products, but some activities are performed by assembled complexes of gene products. Examples of broad functional terms are catalytic activity, transporter activity, or binding; examples of narrower functional terms are adenylate cyclase activity or Toll receptor binding. It is easy to confuse a gene product name with its molecular function, and for that reason many GO molecular functions are appended with the word "activity".

Annotation File Format

Evidence Codes Experimental Evidence Codes – EXP: Inferred from Experiment EXP: Inferred from Experiment – IDA: Inferred from Direct Assay IDA: Inferred from Direct Assay – IPI: Inferred from Physical Interaction IPI: Inferred from Physical Interaction – IMP: Inferred from Mutant Phenotype IMP: Inferred from Mutant Phenotype – IGI: Inferred from Genetic Interaction IGI: Inferred from Genetic Interaction – IEP: Inferred from Expression Pattern IEP: Inferred from Expression Pattern Computational Analysis Evidence Codes – ISS: Inferred from Sequence or Structural Similarity ISS: Inferred from Sequence or Structural Similarity – ISO: Inferred from Sequence Orthology ISO: Inferred from Sequence Orthology – ISA: Inferred from Sequence Alignment ISA: Inferred from Sequence Alignment – ISM: Inferred from Sequence Model ISM: Inferred from Sequence Model – IGC: Inferred from Genomic Context IGC: Inferred from Genomic Context – RCA: inferred from Reviewed Computational Analysis RCA: inferred from Reviewed Computational Analysis

Evidence Codes Author Statement Evidence Codes – TAS: Traceable Author Statement TAS: Traceable Author Statement – NAS: Non-traceable Author Statement NAS: Non-traceable Author Statement Curator Statement Evidence Codes – IC: Inferred by Curator IC: Inferred by Curator – ND: No biological Data available ND: No biological Data available Automatically-assigned Evidence Codes – IEA: Inferred from Electronic Annotation IEA: Inferred from Electronic Annotation Obsolete Evidence Codes – NR: Not Recorded NR: Not Recorded

What is the source of automated annotations? Integrated automated annotation systems combine a variety of analysis types Comparison to databases protein and/or domain families with defined functions (COGs, NCBI CDD, PFAM, ProSite, etc.) Structural characteristic predictions Sequence characteristic predictions

InterPro: www interface

InterPro

InterPro release 16.0 contains entries: Active sites34 Binding sites22 Domains4676 Families10060 PTMs18 Repeats235 DatabaseAll SignaturesIntegrated PANTHER Pfam PIRSF PRINTS ProDom PROSITE SMART TIGRFAMs Gene3D SUPERFAMILY

Sample InterPro Family

InterPro is one source of IEAs

On a genome scale Assign all genes to Interpro families Obtain GO terms (IEA evidence) linked to the Interpro term Use these to find patterns in large gene lists – Experimental ( genes upregulated in array exp) – Comparative (genes with/without orthologs)

Enrichment Find categories (InterPro, GO) that are over- represented in a subset of genes relative to the background (genome?) as a whole Example: 40% of the genes that distinguish between two strains of E. coli are mobile elements. Is this more than I expect based on random chance if 10% of the genome as a whole is mobile elements.

Hypergeometric Distribution describes the number of successes in a sequence of n draws from a finite population without replacementpopulation Black and white balls in an urn Genes with an ortholog and genes without an ortholog Genes differentially expressed, genes unchanged

Comparison of 68 enrichment analysis tools available in 2008