Presentation is loading. Please wait.

Presentation is loading. Please wait.

EBI is an Outstation of the European Molecular Biology Laboratory. Introduction to the Gene Ontology and GO annotation resources Rachael Huntley UniProtKB-GOA.

Similar presentations


Presentation on theme: "EBI is an Outstation of the European Molecular Biology Laboratory. Introduction to the Gene Ontology and GO annotation resources Rachael Huntley UniProtKB-GOA."— Presentation transcript:

1 EBI is an Outstation of the European Molecular Biology Laboratory. Introduction to the Gene Ontology and GO annotation resources Rachael Huntley UniProtKB-GOA Curator EBI

2 2 Talk Overview Intro to GO and GO terms Annotating to GO Practical use of GO GO analysis tools Precautions Accessing GO annotations

3 3 Reasons for the Gene Ontology www.geneontology.org Inconsistency in naming of biological concepts Large datasets need to be interpreted quickly Increasing amounts of biological data available

4 4 Search on ‘DNA repair’... get over 60,000 results Increasing amounts of biological data available Expansion of sequence information

5 5 Reasons for the Gene Ontology Inconsistency in naming of biological concepts Large datasets need to be interpreted quickly Increasing amounts of biological data available

6 6 Need to organise the data, analyse it, share it in a timely manner to benefit other researchers Inconsistencies in analyses make cross- species or cross-database comparison difficult Large datasets need to be interpreted quickly

7 7 Reasons for the Gene Ontology Inconsistency in naming of biological concepts Large datasets need to be interpreted quickly Increasing amounts of biological data available

8 8 English is not a very precise language Same name for different concepts Different names for the same concept Inconsistency in naming of biological concepts ? An example … Tactition Tactile sense Taction Sensory perception of touch ; GO:0050975

9 9 A way to capture biological knowledge in a written and computable form The Gene Ontology A set of concepts and their relationships to each other arranged as a hierarchy www.ebi.ac.uk/QuickGO Less specific concepts More specific concepts

10 10 The Concepts in GO 1. Molecular Function 2. Biological Process 3. Cellular Component An elemental activity or task or job protein kinase activity insulin receptor activity A commonly recognised series of events cell division Where a gene product is located mitochondrion mitochondrial matrix mitochondrial inner membrane

11 11 Anatomy of a GO term Unique identifier Term name Definition Synonyms Cross-references

12 12 Ontology structure Directed acyclic graph Terms can have more than one parent Terms are linked by relationships is_a part_of regulates +ve regulates -ve regulates www.ebi.ac.uk/QuickGO

13 13 http://amigo.geneontology.org/cgi-bin/amigo/go.cgi … there are more browsers available on the GO Consortium Tools page: http://www.geneontology.org/GO.tools.shtml Searching for GO terms The latest OBO format Gene Ontology file can be downloaded from: http://www.geneontology.org/ontology/gene_ontology.obo http://www.ebi.ac.uk/QuickGO

14 14 http://www.ebi.ac.uk/QuickGO The EBI's QuickGO browser Search GO terms or proteins

15 15 GO Annotation

16 16 Reactome http://www.geneontology.org

17 17 Aims of the GO project Compile the ontologies - currently over 34,000 terms - constantly increasing and improving Annotate gene products using ontology terms - around 30 groups provide annotations Provide a public resource of data and tools - regular releases of annotations - tools for browsing/querying annotations and editing the ontology

18 18 Gene Ontology Annotation (UniProtKB-GOA) database at the EBI Largest open-source contributor of annotations to GO Member of the GO Consortium since 2001 Provides annotation for more than 360,000 species UniProtKB-GOA’s priority is to annotate the human proteome UniProtKB-GOA is responsible for human, chicken and bovine annotations in the GO Consortium

19 19 A GO annotation is … …a statement that a gene product; 1. has a particular molecular function or is involved in a particular biological process or is located within a certain cellular component 2. as determined by a particular method 3. as described in a particular reference P00505 AccessionNameGO IDGO term nameReferenceEvidence code IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2

20 20 Electronic Manual GOA makes annotations using two methods; Quick way of producing large numbers of annotations Annotations use less-specific GO terms Time-consuming process producing lower numbers of annotations Annotations are very detailed and accurate

21 21 Electronic Manual Evidence Codes Inferred from Direct Assay (IDA) – e.g. enzyme assay Inferred from Physical Interaction (IPI) – e.g. yeast 2-hybrid Inferred from Electronic Annotation (IEA) – e.g. UniProtKB keyword mapping Some examples… See the full list on the GO Consortium website http://www.geneontology.org/GO.evidence.shtml

22 22 1. Mapping of external concepts to GO terms e.g. InterPro2GO, Swiss-Prot Keyword2GO, Enzyme Commission2GO Annotations are high-quality and have an explanation of the method (GO_REF) Electronic annotation by UniProtKB-GOA Macaque Mouse Dog e.g. Human Cow Guinea Pig Chimpanzee Rat Chicken Ensembl compara 2. Automatic transfer of annotations to orthologs

23 23 Mapping of concepts from UniProtKB files GO:0004715 ; non-membrane spanning protein tyrosine kinase activity GO:0005856: cytoskeleton GO:0007155 ; cell adhesion

24 24 1. Mapping of external concepts to GO terms e.g. InterPro2GO, Swiss-Prot Keyword2GO, Enzyme Commission2GO Annotations are high-quality and have an explanation of the method (GO_REF) Electronic annotation by UniProtKB-GOA Macaque Mouse Dog Cow Guinea Pig Chimpanzee Rat Chicken Ensembl compara 2. Automatic transfer of annotations to orthologs...and more e.g. Human

25 25 Manual annotation by UniProtKB-GOA High–quality, specific annotations made using: Peer-reviewed papers A range of evidence codes to categorise the types of evidence found in a paper http://www.ebi.ac.uk/GOA

26 26 Finding annotations in a paper In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… Process: response to wounding GO:0009611 wound response serine/threonine kinase activity, Function: protein serine/threonine kinase activity GO:0004674 integral membrane protein Component: integral to plasma membrane GO:0005887 …for B. napus PERK1 protein (Q9ARH1) PubMed ID: 12374299

27 27 Additional information Qualifiers Modify the interpretation of an annotation NOT (protein is not associated with the GO term) colocalizes_with (protein associates with complex but is not a bona fide member) contributes_to (describes action of a complex of proteins) 'With' column Can include further information on the method being referenced e.g. the protein accession of an interacting protein

28 28 GO:0005739 Mitochondrion Human Protein Atlas UniProtKB-GOA integrates manual annotations from external groups LifeDB (subcellular locations) Reactome (pathways) also;

29 29 * Includes manual annotations integrated from external model organism and multi-species databases Status of UniProtKB-GOA Annotation 0.9% 66% 159,630 910,536 Manual annotations* 11,261,382 98,037,844Electronic annotations Proteins Annotations Evidence Source UniProt Coverage Aug 2011 Statistics

30 30 How to access and use GO annotation data

31 31 Where can you find annotations? UniProtKB Ensembl Entrez gene

32 32 GO Consortium website UniProtKB-GOA website Gene Association Files 17 column files containing all information for each annotation

33 33 GO browsers

34 34 http://www.ebi.ac.uk/QuickGO The EBI's QuickGO browser Search GO terms or proteins Find sets of GO annotations

35 35 Access gene product functional information Analyse high-throughput genomic or proteomic datasets Validation of experimental techniques Get a broad overview of a proteome Obtain functional information for novel gene products How scientists use the GO Some examples…

36 36 attacked time control Puparial adhesion Molting cycle Hemocyanin Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Immune response Toll regulated genes Amino acid catabolism Lipid metobolism Peptidase activity Protein catabolism Immune response Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI. MicroArray data analysis Analysis of high-throughput genomic datasets

37 37 (Orrù et al., Molecular and Cellular Proteomics 2007) Characterisation of proteins interacting with ribosomal protein S19 Analysis of high-throughput proteomic datasets

38 38 Validation of experimental techniques (Cao et al., Journal of Proteome Research 2006) Rat liver plasma membrane isolation

39 39 Obtain functional information for novel gene products MPYVSQSQHIDRVRGAIEGRLPAPGNSSRLVSSWQRSYEQYRLDPGSVIGPRVLTS SELR DVQGKEEAFLRASGQCLARLHDMIRMADYCVMLTDAHGVTIDYRIDRDRRGD FKHAGLYI GSCWSEREEGTCGIASVLTDLAPITVHKTDHFRAAFTTLTCSASPIFAPTG ELIGVLDAS AVQSPDNRDSQRLVFQLVRQSAALIEDGYFLNQTAQHWMIFGHASRN FVEAQPEVLIAFD ECGNIAASNRKAQECIAGLNGPRHVDEIFDTSAVHLHDVARTDTI MPLRLRATGAVLYAR IRAPLKRVSRSACAVSPSHSGQGTHDAHNDTNLDAISRFLHS RDSRIARNAEVALRIAGK HLPILILGETGVGKEVFAQALHASGARRAKPFVAVNCGAIP DSLIESELFGYAPGAFTGA RSRGARGKIAQAHGGTLFLDEIGDMPLNLQTRLLRVLA EGEVLPLGGDAPVRVDIDVICA THRDLARMVEEGTFREDLYYRLSGATLHMPPLRER ADILDVVHAVFDEEAQSAGHVLTLD GRLAERLARFSWPGNIRQLRNVLRYACAVCDS TRVELRHVSPDVAALLAPDEAALRPALA LENDERARIVDALTRHHWRPNAAAEALGM www.ebi.ac.uk/InterProScan InterProScan

40 40 Annotating novel sequences Can use BLAST queries to find similar sequences with GO annotation which can be transferred to the new sequence Two tools currently available; AmiGO BLAST (from GO Consortium) http://amigo.geneontology.org/cgi-bin/amigo/blast.cgi – searches the GO Consortium database BLAST2GO (from Babelomics) http://www.blast2go.org/ – searches the NCBI database

41 41 AmiGO BLAST amigo.geneontology.org/cgi-bin/amigo/blast.cgi Exportin-T from Pongo abelii (Sumatran orangutan)

42 42 Analysis of large datasets using GO

43 43 Using the GO to provide a functional overview for a large dataset Many GO analysis tools use GO slims to give a broad overview of the dataset GO slims are cut-down versions of the GO and contain a subset of the terms in the whole GO GO slims usually contain less-specialised GO terms

44 44 Slimming the GO using the ‘true path rule’ Many gene products are associated with a large number of descriptive, leaf GO nodes:

45 45 Slimming the GO using the ‘true path rule’ …however annotations can be mapped up to a smaller set of parent GO terms:

46 46 GO slims or you can make your own using; Custom slims are available for download; http://www.geneontology.org/GO.slims.shtml AmiGO's GO slimmer QuickGO http://www.ebi.ac.uk/QuickGO http://amigo.geneontology.org/cgi-bin/amigo/slimmer

47 47 www.ebi.ac.uk/QuickGO Map-up annotations with GO slims The EBI's QuickGO browser Search GO terms or proteins Find sets of GO annotations

48 48 GO analysis tools

49 49 Numerous Third Party Tools www.geneontology.org/GO.tools.shtml

50 50 Term enrichment Most popular type of GO analysis Determines which GO terms are more often associated with a specified list of genes/proteins compared with a control list or rest of genome Many tools available to do this analysis User must decide which is best for their analysis

51 51 Precautions when using GO annotations for analysis Recommended that ‘NOT’ annotations are removed before analysis - only ~5000 out of ~100 million annotations are ‘NOT’ - can confuse the analysis The Gene Ontology is always changing and GO annotations are continually being created - always use a current version of both - if publishing your analyses please report the versions/dates you used http://www.geneontology.org/GO.cite.shtml

52 52 Precautions when using GO annotations for analysis Unannotated is not unknown - where there is no evidence in the literature for a process, function or location the gene product is annotated to the appropriate ontology’s root node with an ‘ND’ evidence code (no biological data), thereby distinguishing between unannotated and unknown Pay attention to under-represented GO terms - a strong under-representation of a pathway may mean that normal functioning of that pathway is necessary for the given condition

53 53 The UniProtKB-GOA group Curators: Software developer: Team leaders: Emily Dimmer Rachael Huntley Rolf Apweiler Email: goa@ebi.ac.uk http://www.ebi.ac.uk/GOA Claire O’Donovan Tony Sawford Yasmin Alam-Faruque

54 54 Acknowledgements GO Consortium Members of; UniProtKB InterPro IntAct HAMAP Funding National Human Genome Research Institute (NHGRI) Kidney Research UK EMBL


Download ppt "EBI is an Outstation of the European Molecular Biology Laboratory. Introduction to the Gene Ontology and GO annotation resources Rachael Huntley UniProtKB-GOA."

Similar presentations


Ads by Google