Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI 25th June 2007 Jane Lomax.

Similar presentations


Presentation on theme: "Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI 25th June 2007 Jane Lomax."— Presentation transcript:

1 Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI
25th June Jane Lomax

2 What is the Gene Ontology?
Set of standard biological phrases (terms) which are applied to genes/proteins: protein kinase apoptosis membrane 25th June Jane Lomax

3 What is the Gene Ontology?
Genes are linked, or associated, with GO terms by trained curators at genome databases known as ‘gene associations’ or GO annotations Some GO annotations created automatically These GO phrases, or TERMS are linked to genes by expert curators at genome databses. will talk about in more detail later 25th June Jane Lomax

4 genome and protein databases
GO annotations genome and protein databases gene -> GO term associated genes GO database The individual genome and protein databases submit their genes and proteins annotated to GO terms to a central GO database. 25th June Jane Lomax

5 What is the Gene Ontology?
Allows biologists to make queries across large numbers of genes without researching each one individually 25th June Jane Lomax

6 Eisen, Michael B. et al. (1998) Proc. Natl. Acad. Sci
Eisen, Michael B. et al. (1998) Proc. Natl. Acad. Sci. USA 95, Copyright ©1998 by the National Academy of Sciences

7 GO structure GO isn’t just a flat list of biological terms
terms are related within a hierarchy Two arrangements for DNA replication 25th June Jane Lomax

8 GO structure gene A A gene (A) that is associated with a term ‘DNA replication’ is automatically annotated to all that terms parent terms. 25th June Jane Lomax

9 GO structure This means genes can be grouped according to user-defined levels Allows broad overview of gene set or genome 25th June Jane Lomax

10 How does GO work? GO is species independent
some terms, especially lower-level, detailed terms may be specific to a certain group e.g. photosynthesis But when collapsed up to the higher levels, terms are not dependent on species 25th June Jane Lomax

11 How does GO work? What does the gene product do? Where and does it act? Why does it perform these activities? What information might we want to capture about a gene product? 25th June Jane Lomax

12 GO structure GO terms divided into three parts: cellular component
molecular function biological process 25th June Jane Lomax

13 Cellular Component where a gene product acts 25th June Jane Lomax

14 Cellular Component 25th June Jane Lomax

15 Cellular Component 25th June Jane Lomax

16 Cellular Component Enzyme complexes in the component ontology refer to places, not activities. 25th June Jane Lomax

17 glucose-6-phosphate isomerase activity
Molecular Function activities or “jobs” of a gene product glucose-6-phosphate isomerase activity 25th June Jane Lomax

18 insulin receptor activity
Molecular Function insulin binding insulin receptor activity 25th June Jane Lomax

19 drug transporter activity
Molecular Function drug transporter activity 25th June Jane Lomax

20 Molecular Function A gene product may have several functions
Sets of functions make up a biological process. 25th June Jane Lomax

21 Biological Process a commonly recognized series of events
cell division 25th June Jane Lomax

22 Biological Process transcription 25th June Jane Lomax

23 regulation of gluconeogenesis
Biological Process regulation of gluconeogenesis 25th June Jane Lomax

24 Biological Process limb development 25th June Jane Lomax

25 Biological Process courtship behavior 25th June Jane Lomax

26 Ontology Structure Terms are linked by two relationships is-a 
part-of  25th June Jane Lomax

27 mitochondrial chloroplast
Ontology Structure cell membrane chloroplast mitochondrial chloroplast membrane membrane is-a part-of 25th June Jane Lomax

28 Ontology Structure Ontologies are structured as a hierarchical directed acyclic graph (DAG) Terms can have more than one parent and zero, one or more children 25th June Jane Lomax

29 Ontology Structure cell membrane chloroplast mitochondrial chloroplast
Directed Acyclic Graph (DAG) - multiple parentage allowed cell membrane chloroplast mitochondrial chloroplast membrane membrane 25th June Jane Lomax

30 Anatomy of a GO term unique GO ID id: GO:0006094 name: gluconeogenesis
namespace: process def: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol. [ exact_synonym: glucose biosynthesis xref_analog: MetaCyc:GLUCONEO-PWY is_a: GO: is_a: GO: term name ontology definition 17800 terms in three ontologies 94% of terms defined synonym database ref parentage 25th June Jane Lomax

31 GO terms Where do GO terms come from?
GO terms are added by editors at EBI and annotating databases new terms are usually only added when they are asked for by annotators GO editors work with experts to make major ontology developments metabolism pathogenesis cell cycle 25th June Jane Lomax

32 GO stats over 23,000 GO terms: 13593 biological_process
1980 cellular_component 7700 molecular_function 25th June Jane Lomax

33 GO annotations Where do the links between genes and GO terms come from? 25th June Jane Lomax

34 GO annotations Contributing databases:
Berkeley Drosophila Genome Project (BDGP) dictyBase (Dictyostelium discoideum) FlyBase (Drosophila melanogaster) GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum, Leishmania major and Trypanosoma brucei) UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro databases Gramene (grains, including rice, Oryza) Mouse Genome Database (MGD) and Gene Expression Database (GXD) (Mus musculus) Rat Genome Database (RGD) (Rattus norvegicus) Reactome Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae) The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana) The Institute for Genomic Research (TIGR): databases on several bacterial species WormBase (Caenorhabditis elegans) Zebrafish Information Network (ZFIN): (Danio rerio) 25th June Jane Lomax

35 Species coverage All major eukaryotic model organism species
Human via GOA group at UniProt Several bacterial and parasite species through TIGR and GeneDB at Sanger many more in pipeline 25th June Jane Lomax

36 Annotation coverage Add graph of % genome coverage from paper
25th June Jane Lomax

37 Anatomy of a GO annotation
Three key parts: gene name/id GO term(s) evidence for association And the evidence that links the GO term with the gene. There are various types of evidence, it can be an experiment, an author statement in a paper, a Blast search or an algorithmic match - I’ll go into more detail about these later. 25th June Jane Lomax

38 Example annotation Breast cancer type 1 susceptibility protein gene in humans Open AmiGO 25th June Jane Lomax

39 Types of GO annotation:
 Electronic Annotation  Manual Annotation So there are two main types of GO annotation, those made electronically, and those made by curator. 25th June Jane Lomax

40 Manual annotation Created by scientific curators High quality
Small number Manual annotations are made by curators at model organism databases They’re consequently of a high quality, but because of the length of time it takes to make these annotations, there are a much smaller number of them than automatic annotations But the number is increasing all the time. Some databases such as Saccaromyces and Drosophila have complete manual annotation of the whole genome, whereas some others have only a little. 25th June Jane Lomax

41 Manual annotation In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… This is an example of how a curator might approach a paper to find GO terms. 25th June Jane Lomax

42 Manual annotation The GO browser AmiGO, which you’ll be using in the tutorial later, displays all of the manual GO annotations. 25th June Jane Lomax

43 Electronic Annotation
Annotation derived without human validation mappings file e.g. interpro2go, ec2go. Blast search ‘hits’ Lower ‘quality’ than manual codes So electronic annotation is where a human hasn’t looked at an annotation, it’s been done entirely automatically. This can be from a mappings file e.g. InterPro2go, spkw2go, from non-validated sequence similarity, or from a combination of different methods. These electronic methods produce very large numbers of annotations, but because they are not individually validated by a curator, can be thought of as having a lower quality than curator approved annotations. 25th June Jane Lomax

44 Mappings files Fatty acid biosynthesis
( Swiss-Prot Keyword) EC: (EC number) IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry) GO:Fatty acid biosynthesis (GO: ) GO:acetyl-CoA carboxylase activity (GO: ) GO:acetyl-CoA carboxylase activity This is an example of how different mappings files, used to create electronic annotations, 25th June Jane Lomax

45 Evidence types ISS: Inferred from Sequence/structural Similarity
IDA: Inferred from Direct Assay IPI: Inferred from Physical Interaction IMP: Inferred from Mutant Phenotype IGI: Inferred from Genetic Interaction IEP: Inferred from Expression Pattern TAS: Traceable Author Statement NAS: Non-traceable Author Statement IC: Inferred by Curator ND: No Data available These are the evidence types broken down - it looks very complicated and there’s no need to worry about this too much, but basically the top box are the manual evidence codes - mostly experimental techniques, and IEA is electronic annotation. IEA: Inferred from electronic annotation 25th June Jane Lomax

46 GO tools GO resources are freely available to anyone to use without restriction Includes the ontologies, gene associations and tools developed by GO Other groups have used GO to create tools for many purposes: 25th June Jane Lomax

47 GO tools Affymetrix also provide a Gene Ontology Mining Tool as part of their NetAffx™ Analysis Center which returns GO terms for probe sets 25th June Jane Lomax

48 GO tools Many tools exist that use GO to find common biological functions from a list of genes: insert slides from sorin’s talk 25th June Jane Lomax

49 GO tools Most of these tools work in a similar way:
input a gene list and a subset of ‘interesting’ genes tool shows which GO categories have most interesting genes associated with them i.e. which categories are ‘enriched’ for interesting genes tool provides a statistical measure to determine whether enrichment is significant 25th June Jane Lomax

50 Microarray process Treat samples Collect mRNA Label Hybridize Scan
Normalize Select differentially regulated genes Understand the biological phenomena involved So this is a typical process that you’d use to collect data from your microarray Just like that - makes it look very easy! So for a certain set of conditions, you have a set of differentially regualted genes and what you really want to know about is the underlying biological processes involved. 25th June Jane Lomax

51 Traditional analysis Gene 1 Apoptosis Cell-cell signaling
Protein phosphorylation Mitosis Gene 2 Growth control Oncogenesis Gene 3 Growth control Mitosis Oncogenesis Protein phosphorylation Gene 4 Nervous system Pregnancy Gene 100 Positive ctrl. of cell prolif Glucose transport Typically, this is the way the analysis would have been done. Taking your differentially regualted genes, you’d analyse them one by one - researching the what is known about that gene, and what processes it is involved in. 25th June Jane Lomax

52 Traditional analysis gene by gene basis requires literature searching
time-consuming So this gene by gene approach has the major disadvantage that you have to delve into the literature yourself, which is obviously very time consuming. 25th June Jane Lomax

53 Using GO annotations But by using GO annotations, this work has already been done for you! But by using GO annotations, this work has already been done for you - someone has already sat down and associated a particular gene with a particular process… GO: : apoptosis 25th June Jane Lomax

54 Grouping by process Mitosis Gene 2 Gene 5 Gene45 Gene 7 Gene 35 …
Glucose transport Gene 7 Gene 3 Gene 6 Apoptosis Gene 1 Gene 53 Positive ctrl. of cell prolif. Gene 7 Gene 3 Gene 12 Growth Gene 5 Gene 2 Gene 6 So you have the ability to group your differentially regulated genes by process… 25th June Jane Lomax

55 GO for microarray analysis
Annotations give ‘function’ label to genes Ask meaningful questions of microarray data e.g. genes involved in the same process, same/different expression patterns? GO and it’s annotations are useful in microarrays mainly for comparing gene expression patterns to gene function, allowing more meaningful interpretation of microarray data. 25th June Jane Lomax

56 Using GO in practice statistical measure
how likely your differentially regulated genes fall into that category by chance mitosis – 80/100 apoptosis – 40/100 p. ctrl. cell prol. – 30/100 glucose transp. – 20/100 The better ones include an statistical measure of how likely your differentially regulated genes fall into that category by chance So why is that necessary So imagine you do a microarray with a 1000 genes, and you find that 100 are differentially regualted And these are the GO processes that those differentially regualted genes fall into - it looks like mitosis is overrepresented…. microarray 1000 genes 100 genes differentially regualted experiment 25th June Jane Lomax

57 Using GO in practice However, when you look at the distribution of all genes on the microarray: Process Genes on array # genes expected in occurred random genes mitosis / apoptosis / p. ctrl. cell prol / glucose transp / you can see that 80% of them were involved in mitosis, so the number upregulated is what you’d expect by chance. The category positive regulation of cell proliferation actually contains more differentially regualted genes than you would expect by chance Need a statistical test e.g. Chi-squared to see if this overrepresentation or enrichment of a certain class is statistically significant. 25th June Jane Lomax

58 Enrichment tools GO is developing its own enrichment tool as part of the GO browser AmiGO Currently in testing phase, should be released next month 25th June Jane Lomax

59 Onto-Express walkthrough
25th June Jane Lomax


Download ppt "Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI 25th June 2007 Jane Lomax."

Similar presentations


Ads by Google