Presentation on theme: "Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from."— Presentation transcript:
Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from
GO Consortium Project Goals 1. We will maintain comprehensive, logically rigorous and biologically accurate ontologies. *2. We will comprehensively annotate reference genomes in as complete detail as possible. *3. We will support annotation across all organisms. 4. We will provide our annotations and tools to the research community.
GO terms are used for functional annotations I Brain development [GO: ] (141 genes, 207 annotations) I
GO Stats: I GO Annotations Total experimental GO annotations - 388,633 Total proteins with manual annotations – 80,402 Contributing Groups (including MGI): - 19 Total Pub Med References – 346,002 Total number predicted annotations – 17,029,553 Total number taxa – 129,318 Total number distinct proteins – 2,971,374 April 24, 2007
Annotations are assertions There is evidence that this gene product can be best classified using this term The source of the evidence and other information is included There is agreement on the meaning of the term
Annotations provide the connection between genomic information and the GO. Experiments provide the data that enables us to annotate gene products with terms from the ontologies. Annotations for App: amyloid beta (A4) precursor proteinApp Annotations are assertions
IDA: Inferred from direct assay IPI: Inferred from physical interaction IMP: Inferred from mutant phenotype IGI: Inferred from genetic interaction IEP: Inferred from expression pattern ISS: Inferred from sequence or structural similarity TAS: Traceable author statement NAS:Non-traceable author statement IC: Inferred by curator RCA: Reviewed Computational Analysis IEA: Inferred from electronic annotation ND: no data available NO Direct Experiment Direct Experiment We use evidence codes to describe the basis of the annotation
Examples of how we connect instances with knowledge representation in the GO What follows are examples of annotation of the biomedical literature using GO types, gene product types and evidence codes
Example #1:Molecular Function using IDA Figure from Zhang M, Chen W, Smith SM, Napoli JL. Molecular characterization of a mouse short chain dehydrogenase/reductase active with all-trans-retinol in intact cells, mRDH1. J Biol Chem Nov 23;276(47):
The Annotation: The Observation NAD + NADH H +
What are the instances in this experiment? Gene product instances Molecules of retinol dehydrogenase Molecular function instances Instances of execution of the molecular function revealed by the assay Instances of molecular function associated with instances of retinol dehydrogenase. These instances are the potential of a molecule of retinol dehydrogenase to execute the function retinol dehydrogenase activity.
We are interested in understanding how gene products contribute to the biology of an organism. What knowledge are we trying to capture?
They do experiments! Experiments are designed to study the properties of gene product instances. Experimental biologists take on The Burden of Proof. How do wet-bench biologists learn about gene products?
We* make annotations! ****** Annotations connect what wet-bench biologists see in the lab with how we represent our current understanding of biological reality How do we represent the accumulated knowledge? * GO curators
The instances are in the lab. We use what people report about instances, but we never actually deal with them directly So, where are the instances?
Gene Product Type Stands proxy for the gene Genes are what we have in MODs Types = what instances have in common Gene Product Instance A molecule of a gene product It can be physically isolated It takes up space What do we mean by gene product?
An annotation Asserts that instances of molecules of a type of gene product have propensity to act as designated by the terms in an ontology such as the GO Is created on the basis of observations of the instances of such types in experiments and of the inferences drawn from such observations Note: comprehensive experimental details are embedded in biomedical publications and in specialized databases What do we mean by annotations?
Example #2: Molecular Function using IMP Figure from Schulz S, Lopez MJ, Kuhn M, Garbers DL. Disruption of the guanylyl cyclase-C gene leads to a paradoxical phenotype of viable but heat-stable enterotoxin-resistant mice. J Clin Invest Sep 15;100(6): Example #2: Molecular Function using IMP
The Annotation: The Observation X X IMP
What are the instances in this experiment? Gene product instances Molecules of GUCY2C protein The lack of functional molecules of GUCY2C in mutants Molecular function instances The execution of the molecular function, measured by the accumulation of cGMP The potential of a molecule of GUCY2C to execute the molecular function Revealed by the correlation between a lack of molecules and a lack of executions of molecular function
The Curator Perspective: Annotation Process 1. Identification of relevant experimental data - Biomedical literature as primary source - Annotations inferred from experiments in performed in other organisms or inferred from sequence structure
The Curator Perspective: Annotation Process 1. Identification of relevant experimental data 2. Identification of the appropriate ontology annotation term - Experimental assay influences limit of resolution/granularityof term assignment available to use - Differences in expertise among curators should result in close, but not necessarily exact, GO term annotations
The Curator Perspective: Annotation Process 1. Identification of relevant experimental data 1. Identification of the appropriate ontology annotation term 2. Employment of annotation quality control processes for - Correct formal structure -Evaluate annotation consistency -Harvest emerging knowledge to refine and extend the GO
Example #3: Biological Process Using IMP Washington Smoak I; Byrd NA; Abu-Issa R; Goddeeris MM; Anderson R; Morris J; Yamamura K; Klingensmith J; Meyers EN, Sonic hedgehog is required for cardiac outflow tract and neural crest cell development., Dev Biol 2005 Jul 15;283(2):
The Annotation: The Observation IMP X
What are the instances in this Experiment? Gene product instances Molecules of the Shh gene Non-functional molecules of the Shh gene Biological Process instances The development of a mouse heart Molecular Function Instances The execution of a molecular function by a molecule of the Shh gene
So, when a biological process occurs, it is the result of molecules of a gene product(s) executing their molecular function(s)
How do wet-bench biologists learn about gene products? They do experiments! Experiments are designed to study the properties of gene product instances. Experimental biologists take on The Burden of Proof. They make conclusions about gene product types based on the accumulated experimental data!
If experiments show: All instances of a gene product studied have the potential to execute the function tyrosine kinase Instances of the same gene product are involved in the biological process limb development All instances of the same gene product are found in instances of the cytoplasm
A wet-bench biologist would conclude: The gene product of this gene is a tyrosine kinase that functions in the cytoplasm and the tyrosine kinase functioning is used in limb development
If we comprehensively annotate genes, can we make the same conclusions? Analysis of gene product annotations lead to new hypothesis for wet-bench biologists to test This is the basis of biological discovery!
Development of GO depends on intersection of curation with ontology refinements New results may stand in conflict with current version of ontology Process of annotation brings new experimental results into perspective with existing scientific knowledge captured in the ontology One of strengths of GO development paradigm is that it is primarily a task of biologist-curators who are experts in understanding the experimental systems
Experimental Literature Hypothesis generation Informatics Resources Data mining, and prediction using ontologies Experiments and data analysis using GO, etc Improved annotations, in MODs, UniProt; Refine bio-ontologies
Summary Gene product annotation is an integral aspect of the work of the GO Consortium Annotations reflect conclusions from experiments as interpreted by the biologist and reviewed by peers The structure of the GO depends upon accumulated knowledge from many experiments resulting in a representation of current thought about biological reality As experimental data changes our view of reality, the ontology must change as well