Gene Ontology (GO) Project http://www.geneontology.org/ Jane Lomax Hi, my name is Jennifer Clark and I work at the gene ontology consortium editorial office at the European Bioinformatics Institute in Cambridge in the UK. Today I’m going to give you an introduction to the work of the Gene ontology consortium.
There is a lot of biological research output
You’re interested in which genes control inmesoderm development…
You get 6752 results! How will you ever find what you want?
How will you spot the patterns? attacked time control Puparial adhesion Molting cycle hemocyanin Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Microarray data shows changed expression of thousands of genes. How will you spot the patterns? Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.
Scientists work hard
There are lots of papers to read http://www.teamtechnology.co.uk/f-scientist.jpg
More papers… http://www.teamtechnology.co.uk/f-scientist.jpg
more and more and more… http://www.teamtechnology.co.uk/f-scientist.jpg
more and Help! more and more! http://www.teamtechnology.co.uk/f-scientist.jpg
The Gene Ontology provides a way to capture and represent biological all this knowledge in a computable form
The Gene Ontology
GO browser
Search on ‘mesoderm development’
Definition of mesoderm development Gene products involved in
GO can be used to help analyse microarray data
Microarray process: Treat samples Collect mRNA Label Hybridize Scan Normalize Select differentially regulated genes Understand the biological phenomena involved So this is a typical process that you’d use to collect data from your microarray Just like that - makes it look very easy! So for a certain set of conditions, you have a set of differentially regualted genes and what you really want to know about is the underlying biological processes involved.
Traditional analysis gene by gene basis requires literature searching time-consuming So this gene by gene approach has the major disadvantage that you have to delve into the literature yourself, which is obviously very time consuming.
Traditional analysis Gene 1 Apoptosis Cell-cell signaling Protein phosphorylation Mitosis … Gene 2 Growth control Oncogenesis Gene 3 Growth control Mitosis Oncogenesis Protein phosphorylation … Gene 4 Nervous system Pregnancy Gene 100 Positive ctrl. of cell prolif Glucose transport Typically, this is the way the analysis would have been done. Taking your differentially regualted genes, you’d analyse them one by one - researching the what is known about that gene, and what processes it is involved in.
Using GO annotations But by using GO annotations, this work has already been done GO:0006915 : apoptosis But by using GO annotations, this work has already been done for you - someone has already sat down and associated a particular gene with a particular process…
Grouping by process Mitosis Gene 2 Gene 5 Gene45 Gene 7 Gene 35 … Glucose transport Gene 7 Gene 3 Gene 6 … Apoptosis Gene 1 Gene 53 Positive ctrl. of cell prolif. Gene 7 Gene 3 Gene 12 … Growth Gene 5 Gene 2 Gene 6 … So you have the ability to group your differentially regulated genes by process…
GO for microarray analysis Annotations give ‘function’ label to genes Ask meaningful questions of microarray data e.g. genes involved in the same process, same/different expression patterns? GO and it’s annotations are useful in microarrays mainly for comparing gene expression patterns to gene function, allowing more meaningful interpretation of microarray data.
How does the Gene Ontology work?
GO structure GO isn’t just a flat list of biological terms terms are related within a hierarchy Two arrangements for DNA replication
GO structure gene A
GO structure This means genes can be grouped according to user-defined levels Allows broad overview of gene set or genome
How does GO work? What information might we want to capture about a gene product?
How does GO work? What information might we want to capture about a gene product? What does the gene product do?
How does GO work? What information might we want to capture about a gene product? What does the gene product do? Where and when does it act?
How does GO work? What information might we want to capture about a gene product? What does the gene product do? Where and when does it act? Why does it perform these activities?
GO structure GO terms divided into three parts: cellular component molecular function biological process
Cellular Component where a gene product acts
Cellular Component
Cellular Component
Cellular Component Enzyme complexes in the component ontology refer to places, not activities.
glucose-6-phosphate isomerase activity Molecular Function activities or “jobs” of a gene product glucose-6-phosphate isomerase activity
insulin receptor activity Molecular Function insulin binding insulin receptor activity
drug transporter activity Molecular Function drug transporter activity
Molecular Function A gene product may have several functions; a function term refers to a reaction or activity, not a gene product Sets of functions make up a biological process
Biological Process a commonly recognized series of events cell division
Biological Process transcription
regulation of gluconeogenesis Biological Process regulation of gluconeogenesis
Biological Process limb development
Biological Process courtship behavior
Ontology Structure Terms are linked by two relationships is-a part-of
mitochondrial chloroplast Ontology Structure cell membrane chloroplast mitochondrial chloroplast membrane membrane is-a part-of
Ontology Structure Ontologies are structured as a hierarchical directed acyclic graph (DAG) Terms can have more than one parent and zero, one or more children
Ontology Structure cell membrane chloroplast mitochondrial chloroplast membrane membrane Directed Acyclic Graph (DAG) - multiple parentage allowed
Anatomy of a GO term unique GO ID id: GO:0006094 name: gluconeogenesis namespace: process def: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol. [http://cancerweb.ncl.ac.uk/omd/index.html] exact_synonym: glucose biosynthesis xref_analog: MetaCyc:GLUCONEO-PWY is_a: GO:0006006 is_a: GO:0006092 term name ontology definition synonym 17800 terms in three ontologies 94% of terms defined database ref parentage
GO can also be useful for resolving language conflicts amongst scientific communities
? In biology… Tactition Taction Tactile sense In developing the ontologies we are solving a number of problems biologists. Currently in biology there are many ambiguities in language. Groups of researchers may use the same words to mean different things, or they may use several different words to refer to the same thing. This causes problems for scientists trying to access research carried out by groups outside their immediate field. It also makes it very difficult to process biological information using a computer. For example three groups of biologists studying different model organisms may all be studying the perception of touch. Scientists in different groups might talk about this single process as ‘tactition’, ‘tactile sense’ or ‘taction’. This differing use of language means that when they try to find and read each other’s papers they will have more trouble. It will also be harder for them to use a computer to find and interpret biological data on this subject since the computer has no way to know that these words mean the same thing.
perception of touch ; GO:0050975 Tactition Taction Tactile sense perception of touch ; GO:0050975 The GO provides a solution to this problem since we take biological concepts like the perception of touch and we make them a single GO item in the ontology. We add all the relevant synonyms, and give a unique numerical identifier to the concept.
Bud initiation? The GO also provides a solution to the opposite problem in which several groups of scientists use the same words to refer to different things. For example the phrase ‘bud inititation’ could refer to the initiation of a tooth bud, a yeast reproductive bud, or a bud on a tree. However, these three types of bud are initiated in quite different ways, and scientists would like to be able to distinguish between them.
= cellular bud initiation = tooth bud initiation = cellular bud initiation To solve this problem the GO differentiates between differing concepts by adding a ‘sensu ending’. So according to this example, we would have ‘bud initiation sensu Metazoa’ to mean the kind of bud initation that gives rise to a tooth in mammals. We would have ‘bud initiation sensu Saccharomyces’ for the initation of a reproductive bud in yeast, and we would have ‘bud initiation sensu Viridiplantae’ for the initation of a tree bud. This means that gene products involved in bud initiation can be categorised along with only those other gene products involved in the same kind of bud initation. = flower bud initiation
Categorization of gene products using GO is called annotation. So how does that happen?
P05147 Take a gene or protein
P05147 Find papers about it PMID: 2976880
Find the GO term describing its function, process PMID: 2976880 Find the GO term describing its function, process or location of action. GO:0047519
P05147 PMID: 2976880 What evidence do they show? IDA GO:0047519
P05147 GO:0047519 IDA PMID:2976880 Record these: IDA P05147
Submit to the GO Consortium
Annotation appears in GO database
Many species groups annotate We see the research of one Clark et al., 2005 Many species groups annotate We see the research of one function across all species
Adding terms to the GO
Developing GO GO under constant development International group of developers central editorial office at EBI - 4 members Developed in consultation with domain experts Term suggestions handled through online tracking system
U.S. Virgin Islands, March 30 - April 3, 2006 2006 Consortium Meeting, St. Croix, U.S. Virgin Islands, March 30 - April 3, 2006 Finally, this is the current list of groups in the consortium. The Editorial office where the ontologies are developed is in Cambridge in the UK, and the rest of these groups contribute annotations. We are keen to include more groups in the annotation process so that more species will be manually annotated to the go and so Harold Drabkin is now going to talk about the process of annotation and about how you can contribute manual annotations of your own gene products.
Contributors dictyBase FlyBase GeneDB Gramene Reactome WormBase The GO Editorial Office Berkeley Bioinformatics and Ontology Project (BBOP) Gene Ontology Annotation @ EBI (GOA) Mouse Genome Database (MGD) and Gene Expression Database (GXD) Rat Genome Database (RGD) Saccharomyces Genome Database (SGD) The Arabidopsis Information Resource (TAIR) The Institute for Genomic Research (TIGR) Zebrafish Information Network (ZFIN) Finally, this is the current list of groups in the consortium. The Editorial office where the ontologies are developed is in Cambridge in the UK, and the rest of these groups contribute annotations. We are keen to include more groups in the annotation process so that more species will be manually annotated to the go and so Harold Drabkin is now going to talk about the process of annotation and about how you can contribute manual annotations of your own gene products.