Biomedical Informatics The Lexical Grid Project: LexGrid Christopher G. Chute, MD DrPH Professor and Chair, Biomedical Informatics Mayo Clinic College.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Biomedical Informatics Reference Ontologies in Biomedicine Christopher G. Chute, MD DrPH Professor and Chair, Biomedical Informatics Mayo Clinic College.
Consistent and standardized common model to support large-scale vocabulary use and adoption Robust, scalable, and common API to reduce variation in clinical.
Open Health Tools Distributed Terminology System Presentation Jack Bowie SVP Sales and Marketing Apelon, Inc. 1.
Amy Sheide Clinical Informaticist 3M Health Information Systems USA Achieving Data Standardization in Health Information Exchange and Quality Measurement.
LexGrid for cBIO Division of Biomedical Informatics Mayo Clinic Rochester, MN.
© Copyright 2008, Mayo Clinic College of Medicine Mayo Clinic Open Health Tools Application for Membership OHT Board Meeting, Birmingham, UK July 1, 2008.
LexWiki Framework & Use Cases SMW for Distributed Terminology Development Guoqian Jiang, PhD, Robert Freimuth, PhD, Haorld Solbrig Mayo Clinic NCI caBIG.
CaBIG™ Terminology Services Path to Grid Enablement Thomas Johnson 1, Scott Bauer 1, Kevin Peterson 1, Christopher Chute 1, Johnita Beasley 2, Frank Hartel.
The Role of the UMLS in Vocabulary Control CENDI Conference “Controlled Vocabulary and the Internet” Stuart J. Nelson, MD.
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Overview of Biomedical Informatics Rakesh Nagarajan.
Biomedical Informatics Some Observations on Clinical Data Representation in EHRs Christopher G. Chute, MD DrPH, Mayo Clinic Chair, ICD11 Revision, World.
EleMAP: An Online Tool for Harmonizing Data Elements using Standardized Metadata Registries and Biomedical Vocabularies Jyotishman Pathak, PhD 1 Janey.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Mayo LexWiki: A Prototype of Collaborative Platform for Terminology/Ontology Content Development Guoqian Jiang, Ph.D. Division of Biomedical Informatics,
Developing an Ontology-based Metadata Management System for Heterogeneous Clinical Databases By Quddus Chong Winter 2002.
Introduction to UDDI From: OASIS, Introduction to UDDI: Important Features and Functional Concepts.
Development Principles PHIN advances the use of standard vocabularies by working with Standards Development Organizations to ensure that public health.
Value Domain and Pick List Support in LexEVS 5.1 Sridhar Dwarkanath Mayo Clinic CaBIG Architecture/VCD Joint Workspace F2F.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2007 National Library of Medicine National Institutes of Health U.S. Dept. of Health.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
LexEVS 6.0 Overview Scott Bauer Mayo Clinic Rochester, Minnesota February 2011.
LexEVS 101 Craig Stancl Rick Kiefer February, 2010.
A Case Study of ICD-11 Anatomy Value Set Extraction from SNOMED CT Guoqian Jiang, PhD ©2011 MFMER | slide-1 Division of Biomedical Statistics & Informatics,
WHOWHO ICD 11 Revision Update NCBO Forum April 29, 2008 Christopher G Chute, MD DrPH ICD-11 Revision Steering Group Chair World Health Organization Professor.
LexEVS Overview Mayo Clinic Rochester, Minnesota June 2009.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
H Using the Open Metadata Registry (OpenMDR) to generate semantically annotated grid services Rakesh Dhaval, MS, Calixto Melean,
LexBIG Release Overview Aug 21, LexBIG Context Project Goals for Sept –Incremental point release of LexBIG infrastructure to support EVS activities.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.
Clinical Data Interchange Standards Consortium (CDISC) uses NCIt for its Study Data Tabulation Model (SDTM) and other global data standards for medical.
Open Terminology Portal (TOP) Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer Institute, Center for Biomedical Informatics.
Networking and Health Information Exchange Unit 5b Health Data Interchange Standards.
Terminology Services in the OpenHIE. Agenda Terminology Services Overview Terminology Services in Rwanda Distributed Terminology System (DTS) Next Steps.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Common Terminology Services 2 CTS 2 Submission Team Status Update HL7 Vocabulary Working Group May 17, 2011.
LexGrid Philosophy, Model and Interfaces Harold R Solbrig Division of Biomedical Statistics and Informatics Mayo Clinic.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
A LexWiki-based Representation and Harmonization Framework for caDSR Common Data Elements Guoqian Jiang, Ph.D. Robert Freimuth, Ph.D. Harold Solbrig Mayo.
- EVS Overview - Biomedical Terminology and Ontology Resources Frank Hartel, Ph.D. Director, Enterprise Vocabulary Services NCI Center for Bioinformatics.
Overview of LexEVS 5.0 LexGrid 2009 Model November, 2009.
Overview of SC 32/WG 2 Standards Projects Supporting Semantics Management Open Forum 2005 on Metadata Registries 14:45 to 15:30 13 April 2005 Larry Fitzwater.
Mining the Biomedical Research Literature Ken Baclawski.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
EVS 4.0 Feature Overview EVS API and User Interface pBIO Meeting March 20, 2007 Frank Hartel Gilberto Fragoso
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
CaBIG™ Terminology Services Path to Grid Enablement Thomas Johnson 1, Scott Bauer 1, Kevin Peterson 1, Christopher Chute 1, Johnita Beasley 2, Frank Hartel.
Supporting Collaborative Ontology Development in Protégé International Semantic Web Conference 2008 Tania Tudorache, Natalya F. Noy, Mark A. Musen Stanford.
National Cancer Institute 1 1 LexBIG integration caCORE Software User Meeting Aug 7, 2006.
LexEVS 5.0: Migrating from EVS 3.x API to LexEVS API Craig R. Stancl, Kevin J. Peterson, H. Scott Bauer, Traci V. St.Martin, Christopher G. Chute, MD PhD.
LexWiki Framework & Use Cases SMW for Distributed Terminology Development Guoqian Jiang, PhD, Harold Solbrig Mayo Clinic Meeting with Dr. Jakob (WHO) May.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
ITI Infrastructure - Wednesday November 7th, IHE IT infrastructure “Terminology Sharing” Christel Daniel (AP-HP, INSERM, Paris), Ana Esterlich (GIP-DMP),
National Cancer Institute caDSR Briefing for Small Scale Harmonication Project Denise Warzel Associate Director, Core Infrastructure caCORE Product Line.
Semantic Web - caBIG Abstract: 21st century biomedical research is driven by massive amounts of data: automated technologies generate hundreds of.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
The UMLS and the Semantic Web
CCNT Lab of Zhejiang University
Networking and Health Information Exchange
Presentation transcript:

Biomedical Informatics The Lexical Grid Project: LexGrid Christopher G. Chute, MD DrPH Professor and Chair, Biomedical Informatics Mayo Clinic College of Medicine Rochester, Minnesota Ontolog Forum 14 December 2006

Biomedical Informatics The Lexical Grid Project: LexGrid Acknowledgements: Harold Solbrig James Buntrock Thomas Johnson Dan Armbrust

Biomedical Informatics © 2006 Mayo Clinic College of Medicine3 Outline - LexGrid Overview Functional Features Problem Framing LexGrid History Present Status Implementations Future

Biomedical Informatics © 2006 Mayo Clinic College of Medicine4 Overview The LexGrid package represents a comprehensive set of software and services to load, publish, and access vocabulary or ontological resources. The package is based upon an open standard HL7 CTS (CTS II intended as more complete) Reference implementations as open source Migration to OHF

Biomedical Informatics © 2006 Mayo Clinic College of Medicine5 LexGrid Interlocking Components Standards - access methods (programming APIs) and formats need to be published and openly available. Tools - standards based tools must be readily available. Content - commonly used vocabularies and ontologies have to be available for access and download.

Biomedical Informatics © 2006 Mayo Clinic College of Medicine6 The Lexical Grid Terminology as a commodity resource Accessible online under a common model through a set of common API's in web-space on web-time cross-linked loosely coupled published individually, when ready exportable locally extendable globally revised open source tooling to browse, edit, etc

Biomedical Informatics © 2006 Mayo Clinic College of Medicine7 Overview purposes Provides a single information model flexible enough to represent yesterday’s, today’s and tomorrow’s terminological or ontological resources Allows resources to be published online, cross-linked, and indexed on demand Provides standardized building blocks and tools that allow applications and users to take advantage of the content where and when it is needed Provide consistency and standardization required to support large-scale vocabulary adoption and use

Biomedical Informatics © 2006 Mayo Clinic College of Medicine8 LexGrid Features Accommodation of multiple vocabulary and ontology distribution formats. Support of multiple data stores to accommodate federated vocabulary distribution. Consistent and standardized access across multiple vocabularies. Rich API for supporting lexical and graph search and traversal. Fully compatible with HL7-CTS implementation. Support for programmatic access via Java,.NET, and web services. Open source tooling and code to facilitate adoption and use.

Biomedical Informatics © 2006 Mayo Clinic College of Medicine9 LexGrid Users Vocabulary service providers. Describes organizations currently supporting externalized API- level interfaces to vocabulary content. Vocabulary integrators. Describes organizations that desire to integrate new vocabulary content or relations to be served locally. Vocabulary users. Describes persons and organizations desiring common, consistent access to vocabulary content for a supporting multiple application development uses.

Biomedical Informatics © 2006 Mayo Clinic College of Medicine10 LexGrid Node Data ServicesServices Java.NET... Import Editors Browsers Query Tools XML Browse and Edit Export Embed LexBIGLexBIG Index LexGrid Conceptual Architecture Components RRF OBO Text Protégé CTS Text OWL XML Lex* Web Clients LexGrid Service Index Registry

Biomedical Informatics © 2006 Mayo Clinic College of Medicine11 LexGrid Node The logical persistence layer for storing and managing vocabulary content. The LexGrid node utilizes relational database management systems for management of data and indexing functions. LexGrid nodes have been successful installed and tested using MySQL, Postgres, UDB/DB2, Oracle, Hypersonic, and LDAP/BDB.

Biomedical Informatics © 2006 Mayo Clinic College of Medicine12 The Import Toolkit(s) Provides an API and a set of administration tools to load, index, publish, and manage vocabulary content for the vocabulary server. Standard formats and models that have been developed include: Rich Release Format (RRF) Ontology Web Language (OWL) LexGrid XML Text Delimited Ontylog XML (Apelon) format Open Biomedical Ontology (OBO)

Biomedical Informatics © 2006 Mayo Clinic College of Medicine13 The Export Toolkit(s) Provides an API and set of administration tools to export content in a standard format from a LexGrid node. Standard formats provided for export include: LexGrid XML OWL

Biomedical Informatics © 2006 Mayo Clinic College of Medicine14 The LexGrid Editor A light weight editor for creating, modifying, and changing vocabulary content. The LexGrid Editor is an Eclipse Based application that supports multi vocabulary query and browsing, interactive views, and logging and auditing. Recent enhancements have provided extensions to accommodate value set creation and management.

Biomedical Informatics © 2006 Mayo Clinic College of Medicine15 LexGrid Principles LexGrid software is based on a model driven architecture. The LexGrid model is maintained in XML Schema format Represents a core component of design. The LexBIG API Java-based API to LexGrid content is formally modeled Accommodates registration of additional load, index, and search functions Provides a conscious separation of service and data classes in order to support deferred query resolution and software iterators

Biomedical Informatics © 2006 Mayo Clinic College of Medicine16 LexGrid Model Lexical Semantics Names (Textual) Definitions Comments Other non-classification property Context Languages and dialects Communities and specialties Localizations Logical Semantics Roles and Relations

Biomedical Informatics © 2006 Mayo Clinic College of Medicine17 LexGrid Model Proposal for standard storage of controlled vocabularies and ontologies Flexible enough to accurately represent a wide variety of vocabularies and other lexically-based resources Defines How vocabularies should be formatted and represented programmatically Several different server storage mechanisms relational database, LDAP and an XML format.

Biomedical Informatics © 2006 Mayo Clinic College of Medicine18 LexGrid Model Coding Scheme Relations Concepts Properties cd codingSchemes describable codingScheme concepts::concepts describable relations::relations describable relations::association relations:: associationInstance associatableElement relations::associationTarget versionableAndDescribable concepts::codedEntry concepts::property concepts::comment concepts::definition concepts::presentation 0..1+concepts0..* +relations 1..* +association 0..*+sourceConcept 0..*+targetConcept 1..*+concept 0..* +property

Biomedical Informatics © 2006 Mayo Clinic College of Medicine19 Model: Code Systems Each service defined to the LexGrid model can encapsulate the definition of one or more vocabularies. Each vocabulary is modeled as an individual code system, known as a codingScheme. Each scheme tracks information used to uniquely identify the code system, along with relevant metadata. The collection of all code systems defined to a service is encapsulated by a single codingSchemes container.

Biomedical Informatics © 2006 Mayo Clinic College of Medicine20 Model: Concepts A code system may define zero or more coded concepts, encapsulated within a single container. A concept represents a coded entity (identified in the model as a codedEntry ) within a particular domain of discourse. Each concept is unique within the code system that defines it. Must be qualified by at least one term or designation, represented in the model as a property. Each property is an attribute, facet, or some other characteristic that may represent or help define the intended meaning of the encapsulating codedEntry. A concept may be the source for and/or the target of zero or more relationships.

Biomedical Informatics © 2006 Mayo Clinic College of Medicine21 Model: Relations Each code system may define one or more containers to encapsulate relationships between concepts. Each named relationship (e.g. “hasSubtype” or “hasPart”) is represented as an association within the LexGrid model. Each relations container must define one or more association. May also further define the nature of the relationship in terms of transitivity, symmetry, reflexivity, forward and inverse names, etc. Multiple instances of each association can be defined, each of which provide a directed relationship between one source and one or more target concepts. Source and target concepts may be contained in the same code system as the association or another if explicitly identified. By default, all source and target concepts are resolved from the code system defining the association. The code system can be overridden by each specific association, relation source ( associationInstance ), or relation target ( associationTarget ).

Biomedical Informatics © 2006 Mayo Clinic College of Medicine22 Available Representations of the LexGrid Model The master representation of the LexGrid model is provided in XML Schema Definition (XSD) format. Conversions to other formal representations are available, including XML Metadata Interchange (XMI) and Unified Modeling Language (UML). Implementation or technology-specific renderings of the model also exist. Relational database schema (MySQL, PostgreSQL, DB2, Oracle, etc) Lightweight Directory Access Protocol (LDAP) schema Programming interfaces generated from the formal representation include Java bean interfaces based on the Eclipse Modeling Framework (EMF) and Castor frameworks.

Biomedical Informatics © 2006 Mayo Clinic College of Medicine23 Disease Understanding Constrained by Knowledge Carolus Linnaeus Carl von Linné Genera Morborum (1763) Underscored Content Difficulty Pathophysiology vs Manifestation e.g. Rabies as psychiatric disease

Biomedical Informatics © 2006 Mayo Clinic College of Medicine24 The Genomic Era The genomic transformation of medicine far exceeds the introduction of antibiotics and aseptic surgery The binding of genomic biology and clinical medicine will accelerate The implications for shared semantics across the basic science and clinical communities are unprecedented The implications for Public Health surveillance and inference are profound

Biomedical Informatics © 2006 Mayo Clinic College of Medicine25 From Practice-based Evidence to Evidence-based Practice Patient Encounters Clinical Databases Registries et al. Clinical Guidelines Medical Knowledge Expert Systems Data Inference KnowledgeManagement Decisionsupport Ontology Shared Semantics Vocabularies & Terminologies

Biomedical Informatics © 2006 Mayo Clinic College of Medicine26 The Historical Center of the Health Data Universe Clinical Data Billable Diagnoses

Biomedical Informatics © 2006 Mayo Clinic College of Medicine27 Copernican Health Data Universe Billable Diagnoses Clinical Data (Niklas Koppernigk) Guidelines Scientific Literature Medical Literature Clinical Data Genomic Characteristics

Biomedical Informatics © 2006 Mayo Clinic College of Medicine28 Continuum from Nomenclature to Classification Patient Data is Highly Detailed Modifiers: Anatomy, Stage, Severity, Extent Qualifiers: Probability, Temporal Status Aggregate Uses Require Categorization Granularity of Classifiers Focused Groups and Strata for CQI/Outcomes Broad Statistical/Fiscal Groups

Biomedical Informatics © 2006 Mayo Clinic College of Medicine29 Familiar Points Along Continuum Modern Health Vocabularies Nomenclature – Highly Detailed Descriptions (SNOMED) Classification – Organized Aggregation of Descriptions into a Rubric (ICDs) Groupings – High Level Categories of Rubrics (DRGs) Detailed Grouped NomenclatureClassificationGroups

Biomedical Informatics © 2006 Mayo Clinic College of Medicine30 Blois, 1988 Medicine and the nature of vertical reasoning Molecular: receptors, enzymes, vitamins, drugs Genes, SNPs, gene regulation Physiologic pathways, regulatory changes Cellular metabolism, interaction, meiosis,… Tissue function, integrity Organ function, pathology Organism (Human), disease Sociology, environment, nutrition, mental health…

Biomedical Informatics © 2006 Mayo Clinic College of Medicine31 The Continuum Of Biomedical Informatics Bioinformatics meets Medical Informatics Chasm of Semantic Despair

Biomedical Informatics © 2006 Mayo Clinic College of Medicine32 Feudal Cognition Intellectual Semantic Baronies Genetic variation – Genomics Haplotypes – Statistical Genomics Molecular – Metabolomics, Proteomics Binding – Molecular simulation Pathways – Physiology and Systems Biology Symptoms – Consumer Health Rx and Px – Clinical Medicine Risk – Public Health, Epidemiology Social impact – Sociology, Health Economics

Biomedical Informatics © 2006 Mayo Clinic College of Medicine33 Molecular Clinical Fine Detail Highly Aggregated Immunology ? Disease Anatomy Pulmonary Disease asthma LungNose pneumonia Nasal Disease allergic rhinitis Airway Nucleotide Molecule Amino Acid Sequence Protein Enzyme Amino Acid TPMTHNMT Thr105Ileallozyme Lysine Immunoglobulin IgE has translation Peptide

Biomedical Informatics © 2006 Mayo Clinic College of Medicine34 Aggregation Logics by domain rule-based aggregations Decision Support and Error Detection Public Health and Surveillance Reimbursement and Management Outcome Research and Epidemiology FindingsInterventionsEvents

Biomedical Informatics © 2006 Mayo Clinic College of Medicine35 Making Shared Context Explicit CONCEPT Referent Refers To Symbolises Stands For “Rose”, “ClipArt” Refers To Symbolises Stands For “Rose”, “ClipArt” CONCEPT Symbol “I see a ClipArt image of a rose” Context Formal Shared Context Terminologies [From Solbrig]

Biomedical Informatics © 2006 Mayo Clinic College of Medicine36 Proliferation of Content “Have it your way” Vocabulary Models Major ontologies SNOMED CT; Gene Ontology; LOINC; NDF-RT UMLS Metathesaurus; NCI Thesaurus HL7 RIM and Vocabulary; DICOM RadLex CDC bioterrorism PHIN standards caBIG DSR / CDEs (Common Data Elements) All created with differing formats and models Mechanisms for content sharing Research Area

Biomedical Informatics © 2006 Mayo Clinic College of Medicine37 History of Terminology Services in the US YATN: yet another terminology service 1996 Mayo, Kaiser, Lexical Technology MetaPhrase – Lexical Technology 1998 LQS: Lexicon Querry Services; 3M 1998 Mayo Autocoder: UI to YATN suite 2000 CTS: Common Terminology Services 2003 HL7 balloted standard 2004 LexGrid: superset CTS, ref. implementation – 04

Biomedical Informatics © 2006 Mayo Clinic College of Medicine38 Mayo’s Work with Problem List Interface Design Premise upon Terminology Server MetaPhrase Prototypes on the Network Iterative Usability Lab Evaluations Mock-ups in VB, Delphi, Java, … Evolve Toward Subset of Functional Needs Problem List Specific Drive Specification and Operation of T Server

Biomedical Informatics © 2006 Mayo Clinic College of Medicine39 Terminology Services for Humans

Biomedical Informatics © 2006 Mayo Clinic College of Medicine40 Common Terminology Services (CTS) An HL7 ANSI standard Defines the minimum set of requirements for interoperability across disparate healthcare applications A specification for accessing terminology content The CTS identifies the minimum set of functional characteristics a terminology resource must possess for use in HL7. A functional model Defining the functional characteristics of vocabulary as a set of Application Programming Interfaces (APIs)

Biomedical Informatics © 2006 Mayo Clinic College of Medicine41 CTS APIs Define the necessary functions for healthcare terminology Decouples terminology from the terminology service. Technology independent Legacy database Institutional infrastructure Provide common interface and reference model I know what you mean by Code System Coded Concept Relationship

Biomedical Informatics © 2006 Mayo Clinic College of Medicine42 Mayo LexGrid Project Ontology Services HL7 ANSI Standard ISO Standard Open specification Provide consistency and standardization required to support large-scale vocabulary adoption and use Common model, tools, formats, and interfaces Standard terminology model (Excel to OWL) Grid-nodal architecture

Biomedical Informatics © 2006 Mayo Clinic College of Medicine43 Examples and Proof of Concept NIH RoadMap: Nat. Center Biomedical Ontologies Mayo LexGrid project [MLG] Clinical and basic science ( Gene Ontology ) communities NCI caBIG – Bioinformatics Grid [MLG] HHS/ONC NHIN National Health Information Network IBM Data Coordination project NLM/HL7 Coordination project; [MLG] CDC PHIN Public Health Information Network [MLG] W3C Semantic Web XML/RDF/OWL ISO metadata standards [MLG]

Biomedical Informatics © 2006 Mayo Clinic College of Medicine44 LexGrid Applications at Mayo for Semantic Annotation and Integration Basis for NLP (Natural Language Processing) entity annotation – clinical notes Harmonize data elements, values sets Getting the data right Information retrieval and navigation Getting the right data Grounding for data governance Foundation for semantic interoperability

Biomedical Informatics © 2006 Mayo Clinic College of Medicine45 Cancer Biomedical Informatics Grid (caBIG) Coordinated infrastructure for Cancer Research Clinical Trials, Integrative Cancer Research, Tissue Banking and Pathology Tools Vocabulary, Common Data Elements, Architecture

Biomedical Informatics © 2006 Mayo Clinic College of Medicine46 LexBIG Vision

Biomedical Informatics © 2006 Mayo Clinic College of Medicine47

Biomedical Informatics © 2006 Mayo Clinic College of Medicine48 LexPHIN CDC Public Health Informatics Network Adoption of the LexGrid Model Replace PHIN Vocabulary Services (VS) Addresses genomic characterization of disease Span semantic chasm with Gene Ontology Organized Value Sets Outbreak Management System Biosurveillance and Biosense aggregation

Biomedical Informatics © 2006 Mayo Clinic College of Medicine49 LexPHIN Model Concepts Value Domains Coding Scheme Relations Versions versions:: history concepts::concepts versionableAndDescribable valueDomains::valueDomain versionableAndDescribable codingSchemes::codingScheme describable relations::relations valueDomains::valueDomains codingSchemes::codingSchemes describable service::service +history concepts0..1 +valueDomain1..* +codingScheme1..* +relations0..* +valueDomains0..1 +codingSchemes0..1

Biomedical Informatics © 2006 Mayo Clinic College of Medicine50 Health Level Seven (HL7) Vocabulary and value domain management Tooling for vocabulary submissions Includes change events for HL7 governance process

Biomedical Informatics © 2006 Mayo Clinic College of Medicine51 HL7 Value Domain Editor

Biomedical Informatics © 2006 Mayo Clinic College of Medicine52 NCBO – A Bridge Across the Chasm

Biomedical Informatics © 2006 Mayo Clinic College of Medicine53 NCBO Tools

Biomedical Informatics © 2006 Mayo Clinic College of Medicine54 Ontology List

Biomedical Informatics © 2006 Mayo Clinic College of Medicine55 Ontology Counts Total Number of Ontologies 52 NCBO Library45 Remote7 Number of Classes * *ontologies which have been parsed and indexed

Biomedical Informatics © 2006 Mayo Clinic College of Medicine56 Ontologies by Category

Biomedical Informatics © 2006 Mayo Clinic College of Medicine57 Expanded Categories

Biomedical Informatics © 2006 Mayo Clinic College of Medicine58 GO Biological Process Metadata

Biomedical Informatics © 2006 Mayo Clinic College of Medicine59 Concept Search

Biomedical Informatics © 2006 Mayo Clinic College of Medicine60 Search Results

Biomedical Informatics © 2006 Mayo Clinic College of Medicine61 MeSH Results

Biomedical Informatics © 2006 Mayo Clinic College of Medicine62 MeSH Hindlimb

Biomedical Informatics © 2006 Mayo Clinic College of Medicine63 BioPortal Stanford University Archana Vembakam and Lynn Murphy

Biomedical Informatics © 2006 Mayo Clinic College of Medicine64 LexGrid Future Issues Federated vocabulary node synchronization and registration/discovery. API extensions to support local vocabulary extensions and provider suggestions. API extensions to support HL7/CTSII API (currently being defined). API extensions to support submission of vocabulary change requests. API extensions to load and map between additional vocabulary formats. ISO and LexGrid integration Provide additional index services Synonymy and normalized search Reasoner or classifier adaptation Automated coding of medical records Provide a light-weight Representational State Transfer (REST) service implementation.

Biomedical Informatics © 2006 Mayo Clinic College of Medicine65 Conclusion Biomedicine concepts have become complex and intertwined Big science model of future research 21 st Century Medicine will require comparable and consistent data (Clinical and Genomic) Ontologies as formal models of concepts provide great opportunity Tools, content, and resources are becoming increasingly available LexGrid is emerging as an integrating force

Biomedical Informatics © 2006 Mayo Clinic College of Medicine66 Resources LexGrid Project LexBIG Forge Site caBIG LexGrid CVS NCBO Project