The Gene Ontology and its insertion into UMLS Jane Lomax.

Slides:



Advertisements
Similar presentations
The Gene Ontology Project: Content for the Semantic Web.
Advertisements

Annotation of Gene Function …and how thats useful to you.
Www. GeneOntology.org Gene Ontology Collaboration.
Gene Ontology John Pinney
1 Knowledge Management for Disease Coding (KMDC): Background & Introduction Timothy Hays, Ph.D. Project Manager, Knowledge Management for Disease Coding.
The Role of the UMLS in Vocabulary Control CENDI Conference “Controlled Vocabulary and the Internet” Stuart J. Nelson, MD.
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
COG and GO tutorial.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Internet tools for genomic analysis: part 2
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2007 National Library of Medicine National Institutes of Health U.S. Dept. of Health.
Gene Ontology Project
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
1 Betsy L. Humphreys, MLS Betsy L. Humphreys, MLS National Library of Medicine National Library of Medicine National Institutes of Health National Institutes.
Using The Gene Ontology: Gene Product Annotation.
Gene Ontology (GO) Project
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2005 May 16 & 17, 2005 Rachel Kleinsorge.
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA Experiences in visualizing and navigating biomedical.
Betsy L. Humphreys Betsy L. Humphreys Associate Director for Library Operations NLM, NIH, HHS NLM, NIH, HHS National Library.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
1 st June 2006 St. George’s University of LondonSlide 1 Using UMLS to map from a Library to a Clinical Classification: Improving the Functionality of a.
March 24, Integrating genomic knowledge sources through an anatomy ontology Gennari JH, Silberfein A, and Wiley JC Pac Symp Biocomputing 2005:
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Gene Ontology Consortium
Literature Mining and Ontology BMI/IBGP 730 Autumn, 2010 Yang Xiang, Ph.D. in Computer Science Department of Biomedical.
Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.
Shelly Warwick, MLS, Ph.D – Permission is granted to reproduce and edit this work for non-commercial educational use as long as attribution is provided.
The Gene Ontology: a real-life ontology, progress and future. Jane Lomax EMBL-EBI.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology Project
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
UMLS Unified Medical Language System. What is UMLS? A Unified knowledge representation system Project of NLM Large scale Distributed First launched in.
Integrating the Cell Cycle Ontology with the Mouse Genome Database David R. Smith Mary Dolan Dr. Judith Blake.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
To Boldly GO… Amelia Ireland GO Curator EBI, Hinxton, UK.
Gene Ontology Project
Gene Ontology Consortium
Digital Libraries, Archives, and Large Data Sets Alexa T. McCray National Library of Medicine Bethesda, Maryland USA WHOI, June 3, 2004.
Mining the Biomedical Research Literature Ken Baclawski.
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース GDGDB - Glyco-Disease Genes Database The complexity of glycan metabolic pathways.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
Computer Science Ph. D. Seminar Gene Ontology (GO) Based Search for Protein Structure Similarity Clustering Metrics Ph.D. Candidate Steve Johnson Committee.
The UMLS Semantic Network Alexa T. McCray Center for Clinical Computing Beth Israel Deaconess Medical Center Harvard Medical School
MAPPING OF SEQUENCES TO GENE ONTOLOGY. GO consortium.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Gene Ontology TM (GO) Consortium
Joined up ontologies: incorporating the Gene Ontology into the UMLS.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Annotating with GO: an overview
MAPPING OF SEQUENCES TO GENE ONTOLOGY
Department of Genetics • Stanford University School of Medicine
Overview Gene Ontology Introduction Biological network data
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Presentation transcript:

The Gene Ontology and its insertion into UMLS Jane Lomax

The Gene Ontology Set of three structured vocabularies Provide functional annotation of gene products Dynamic Cross-references to external databases

The vocabularies Molecular function — elemental activity or task Biological process — broad objective or goal Cellular component — location or complex

The vocabularies Molecular function — elemental activity or task nuclease, DNA binding, microtubule motor Biological process — broad objective or goal Cellular component — location or complex

The vocabularies Molecular function — elemental activity or task nuclease, DNA binding, microtubule motor Biological process — broad objective or goal mitosis, signal transduction, metabolism Cellular component — location or complex

The vocabularies Molecular function — elemental activity or task nuclease, DNA binding, microtubule motor Biological process — broad objective or goal mitosis, signal transduction, metabolism Cellular component — location or complex nucleus, ribosome

GO structure Directed acyclic graph (DAG) Allows multiple parentage

True-path rule Every path from a node back to the root must be biologically accurate

Relationship types is_a subclass: a is a type of b part_of physical part of (component) sub-process of (process)

What makes up a GO term? term name go_id definition and definition dbxref GO synonym general dbxref comment

GO cross-links Cross-references within GO EC RESID MetaCyc Mappings SWISS-PROT keywords Links in other databases InterPro UMLS/MeSH – in progress

Why insert GO into UMLS? A rich, widely used source for expanding UMLS Can be used to improve areas of MeSH Potential for ‘non-fuzzy’ text mining using GO terms MeSH terms manually assigned to papers

Unified Medical Language System (UMLS) Research project maintained by the National Library of Medicine (NLM) Aims to allow computers to ‘understand’ biomedical meaning improve retrieval and integration of computer readable info Has three ‘Knowledge sources’: UMLS Metathesaurus SPECIALIST lexicon semantic network

Knowledge sources UMLS Metathesaurus links multiple source vocabularies into unified concepts, includes MeSH (Medical Subject Headings) GO to become source vocabulary SPECIALIST lexicon provides biomedical/English lexical info semantic network for categorizing concepts

Inserting GO into UMLS inversion converting GO to correct format for UMLS insertion inserting GO using matching algorithms editing all concepts containing GO term reviewed by hand

Statistics Approximately 23% of GO terms ‘match’ something in another source vocabulary 23.03% GO terms in concepts with other sources 76.97% GO terms in concepts where they are the only source

Statistics biological processmolecular functioncellular component % of GO in sources with other concepts, by GO vocabulary 4.6%27.8%45.2%

Statistics % of GO in sources with other concepts, by source CSP2002 (Computer Retrieval of Information on Scientific Projects Thesaurus) 7.34 % MSH2003_2002_08_14 (Medical Subject Headings) % SNMI98 (Systemized Nomenclature of Human and Veterinary Medicine) % GO CRISP MeSH SNOMED

concept name concept id GO atoms MeSH atoms EC number contexts relationships to other concepts definition

Challenges with insertion GO synonyms As GO evolved - now not all synonymous GO enzymes GO separates enzyme function from enzyme ‘complexes’ - most vocabularies don’t Semantic types What semantic types now apply to concepts with GO atoms?

Future of insertion Hoped that GO can be released with UMLS early next year dependent on ironing out problems Maintenance of insertion GO changing continually - large differences between UMLS releases

FlyBase & Berkeley Drosophila Genome Project Saccharomyces Genome Database PomBase (Sanger Institute) Rat Genome Database Genome Knowledge Base (CSHL) The Institute for Genomic Research Compugen, Inc The Arabidopsis Information Resource WormBase DictyBase Mouse Genome Informatics Swiss-Prot/TrEMBL/InterPro Pathogen Sequencing Unit (Sanger Institute) National Library of Medicine Alexa McCray Stuart Nelson Bill Hole Oak Ridge Institute for Science and Education National Library of Medicine U. S. Department of Energy The Gene Ontology Consortium is supported by an R01 grant from the National Human Genome Research Institute (NHGRI) [grant HG02273]. SGD is supported by a P41, National Resources, grant from the NHGRI [grant HG01315]; MGD by a P41 from the NHGRI [grant HG00330]; GXD by the National Institute of Child Health and Human Development [grant HD33745]; FlyBase by a P41 from the NHGRI [grant HG00739] and by the Medical Research Council, London. TAIR is supported by the National Science Foundation [grant DBI ]. WormBase is supported by a P41, National Resources, grant from the NHGRI [grant HG02223]; RGD is supported by an R01 grant from the NHLBI [grant HL64541]; DictyBase is supported by an R01 grant from the NIGMS [grant GM064426].