Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Gene Ontology and its insertion into UMLS Jane Lomax.

Similar presentations


Presentation on theme: "The Gene Ontology and its insertion into UMLS Jane Lomax."— Presentation transcript:

1 The Gene Ontology and its insertion into UMLS Jane Lomax

2 The Gene Ontology Set of three structured vocabularies Provide functional annotation of gene products Dynamic Cross-references to external databases

3 The vocabularies Molecular function — elemental activity or task Biological process — broad objective or goal Cellular component — location or complex

4 The vocabularies Molecular function — elemental activity or task nuclease, DNA binding, microtubule motor Biological process — broad objective or goal Cellular component — location or complex

5 The vocabularies Molecular function — elemental activity or task nuclease, DNA binding, microtubule motor Biological process — broad objective or goal mitosis, signal transduction, metabolism Cellular component — location or complex

6 The vocabularies Molecular function — elemental activity or task nuclease, DNA binding, microtubule motor Biological process — broad objective or goal mitosis, signal transduction, metabolism Cellular component — location or complex nucleus, ribosome

7 GO structure Directed acyclic graph (DAG) Allows multiple parentage

8 True-path rule Every path from a node back to the root must be biologically accurate

9 Relationship types is_a subclass: a is a type of b part_of physical part of (component) sub-process of (process)

10 What makes up a GO term? term name go_id definition and definition dbxref GO synonym general dbxref comment

11 GO cross-links Cross-references within GO EC RESID MetaCyc Mappings SWISS-PROT keywords Links in other databases InterPro UMLS/MeSH – in progress

12 Why insert GO into UMLS? A rich, widely used source for expanding UMLS Can be used to improve areas of MeSH Potential for ‘non-fuzzy’ text mining using GO terms MeSH terms manually assigned to papers

13 Unified Medical Language System (UMLS) Research project maintained by the National Library of Medicine (NLM) Aims to allow computers to ‘understand’ biomedical meaning improve retrieval and integration of computer readable info Has three ‘Knowledge sources’: UMLS Metathesaurus SPECIALIST lexicon semantic network

14 Knowledge sources UMLS Metathesaurus links multiple source vocabularies into unified concepts, includes MeSH (Medical Subject Headings) GO to become source vocabulary SPECIALIST lexicon provides biomedical/English lexical info semantic network for categorizing concepts

15 Inserting GO into UMLS inversion converting GO to correct format for UMLS insertion inserting GO using matching algorithms editing all concepts containing GO term reviewed by hand

16 Statistics Approximately 23% of GO terms ‘match’ something in another source vocabulary 23.03% GO terms in concepts with other sources 76.97% GO terms in concepts where they are the only source

17 Statistics biological processmolecular functioncellular component % of GO in sources with other concepts, by GO vocabulary 4.6%27.8%45.2%

18 Statistics % of GO in sources with other concepts, by source CSP2002 (Computer Retrieval of Information on Scientific Projects Thesaurus) 7.34 % MSH2003_2002_08_14 (Medical Subject Headings) 19.74 % SNMI98 (Systemized Nomenclature of Human and Veterinary Medicine) 11.05 % GO CRISP MeSH SNOMED

19 concept name concept id GO atoms MeSH atoms EC number contexts relationships to other concepts definition

20 Challenges with insertion GO synonyms As GO evolved - now not all synonymous GO enzymes GO separates enzyme function from enzyme ‘complexes’ - most vocabularies don’t Semantic types What semantic types now apply to concepts with GO atoms?

21 Future of insertion Hoped that GO can be released with UMLS early next year dependent on ironing out problems Maintenance of insertion GO changing continually - large differences between UMLS releases

22 www.geneontology.org FlyBase & Berkeley Drosophila Genome Project Saccharomyces Genome Database PomBase (Sanger Institute) Rat Genome Database Genome Knowledge Base (CSHL) The Institute for Genomic Research Compugen, Inc The Arabidopsis Information Resource WormBase DictyBase Mouse Genome Informatics Swiss-Prot/TrEMBL/InterPro Pathogen Sequencing Unit (Sanger Institute) National Library of Medicine Alexa McCray Stuart Nelson Bill Hole Oak Ridge Institute for Science and Education National Library of Medicine U. S. Department of Energy The Gene Ontology Consortium is supported by an R01 grant from the National Human Genome Research Institute (NHGRI) [grant HG02273]. SGD is supported by a P41, National Resources, grant from the NHGRI [grant HG01315]; MGD by a P41 from the NHGRI [grant HG00330]; GXD by the National Institute of Child Health and Human Development [grant HD33745]; FlyBase by a P41 from the NHGRI [grant HG00739] and by the Medical Research Council, London. TAIR is supported by the National Science Foundation [grant DBI-9978564]. WormBase is supported by a P41, National Resources, grant from the NHGRI [grant HG02223]; RGD is supported by an R01 grant from the NHLBI [grant HL64541]; DictyBase is supported by an R01 grant from the NIGMS [grant GM064426].


Download ppt "The Gene Ontology and its insertion into UMLS Jane Lomax."

Similar presentations


Ads by Google