Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biomedical Informatics The Lexical Grid Project: LexGrid Christopher G. Chute, MD DrPH Professor and Chair, Biomedical Informatics Mayo Clinic College.

Similar presentations


Presentation on theme: "Biomedical Informatics The Lexical Grid Project: LexGrid Christopher G. Chute, MD DrPH Professor and Chair, Biomedical Informatics Mayo Clinic College."— Presentation transcript:

1 Biomedical Informatics The Lexical Grid Project: LexGrid Christopher G. Chute, MD DrPH Professor and Chair, Biomedical Informatics Mayo Clinic College of Medicine Rochester, Minnesota Ontolog Forum 14 December 2006

2 Biomedical Informatics The Lexical Grid Project: LexGrid Acknowledgements: Harold Solbrig James Buntrock Thomas Johnson Dan Armbrust

3 Biomedical Informatics © 2006 Mayo Clinic College of Medicine3 Outline - LexGrid Overview Functional Features Problem Framing LexGrid History Present Status Implementations Future

4 Biomedical Informatics © 2006 Mayo Clinic College of Medicine4 Overview The LexGrid package represents a comprehensive set of software and services to load, publish, and access vocabulary or ontological resources. The package is based upon an open standard HL7 CTS (CTS II intended as more complete) Reference implementations as open source http://informatics.mayo.edu Migration to OHF

5 Biomedical Informatics © 2006 Mayo Clinic College of Medicine5 LexGrid Interlocking Components Standards - access methods (programming APIs) and formats need to be published and openly available. Tools - standards based tools must be readily available. Content - commonly used vocabularies and ontologies have to be available for access and download.

6 Biomedical Informatics © 2006 Mayo Clinic College of Medicine6 The Lexical Grid Terminology as a commodity resource Accessible online under a common model through a set of common API's in web-space on web-time cross-linked loosely coupled published individually, when ready exportable locally extendable globally revised open source tooling to browse, edit, etc

7 Biomedical Informatics © 2006 Mayo Clinic College of Medicine7 Overview purposes Provides a single information model flexible enough to represent yesterday’s, today’s and tomorrow’s terminological or ontological resources Allows resources to be published online, cross-linked, and indexed on demand Provides standardized building blocks and tools that allow applications and users to take advantage of the content where and when it is needed Provide consistency and standardization required to support large-scale vocabulary adoption and use

8 Biomedical Informatics © 2006 Mayo Clinic College of Medicine8 LexGrid Features Accommodation of multiple vocabulary and ontology distribution formats. Support of multiple data stores to accommodate federated vocabulary distribution. Consistent and standardized access across multiple vocabularies. Rich API for supporting lexical and graph search and traversal. Fully compatible with HL7-CTS implementation. Support for programmatic access via Java,.NET, and web services. Open source tooling and code to facilitate adoption and use.

9 Biomedical Informatics © 2006 Mayo Clinic College of Medicine9 LexGrid Users Vocabulary service providers. Describes organizations currently supporting externalized API- level interfaces to vocabulary content. Vocabulary integrators. Describes organizations that desire to integrate new vocabulary content or relations to be served locally. Vocabulary users. Describes persons and organizations desiring common, consistent access to vocabulary content for a supporting multiple application development uses.

10 Biomedical Informatics © 2006 Mayo Clinic College of Medicine10 LexGrid Node Data ServicesServices Java.NET... Import Editors Browsers Query Tools XML Browse and Edit Export Embed LexBIGLexBIG Index LexGrid Conceptual Architecture Components RRF OBO Text Protégé CTS Text OWL XML Lex* Web Clients LexGrid Service Index Registry

11 Biomedical Informatics © 2006 Mayo Clinic College of Medicine11 LexGrid Node The logical persistence layer for storing and managing vocabulary content. The LexGrid node utilizes relational database management systems for management of data and indexing functions. LexGrid nodes have been successful installed and tested using MySQL, Postgres, UDB/DB2, Oracle, Hypersonic, and LDAP/BDB.

12 Biomedical Informatics © 2006 Mayo Clinic College of Medicine12 The Import Toolkit(s) Provides an API and a set of administration tools to load, index, publish, and manage vocabulary content for the vocabulary server. Standard formats and models that have been developed include: Rich Release Format (RRF) Ontology Web Language (OWL) LexGrid XML Text Delimited Ontylog XML (Apelon) format Open Biomedical Ontology (OBO)

13 Biomedical Informatics © 2006 Mayo Clinic College of Medicine13 The Export Toolkit(s) Provides an API and set of administration tools to export content in a standard format from a LexGrid node. Standard formats provided for export include: LexGrid XML OWL

14 Biomedical Informatics © 2006 Mayo Clinic College of Medicine14 The LexGrid Editor A light weight editor for creating, modifying, and changing vocabulary content. The LexGrid Editor is an Eclipse Based application that supports multi vocabulary query and browsing, interactive views, and logging and auditing. Recent enhancements have provided extensions to accommodate value set creation and management.

15 Biomedical Informatics © 2006 Mayo Clinic College of Medicine15 LexGrid Principles LexGrid software is based on a model driven architecture. The LexGrid model is maintained in XML Schema format Represents a core component of design. The LexBIG API Java-based API to LexGrid content is formally modeled Accommodates registration of additional load, index, and search functions Provides a conscious separation of service and data classes in order to support deferred query resolution and software iterators

16 Biomedical Informatics © 2006 Mayo Clinic College of Medicine16 LexGrid Model Lexical Semantics Names (Textual) Definitions Comments Other non-classification property Context Languages and dialects Communities and specialties Localizations Logical Semantics Roles and Relations

17 Biomedical Informatics © 2006 Mayo Clinic College of Medicine17 LexGrid Model Proposal for standard storage of controlled vocabularies and ontologies Flexible enough to accurately represent a wide variety of vocabularies and other lexically-based resources Defines How vocabularies should be formatted and represented programmatically Several different server storage mechanisms relational database, LDAP and an XML format.

18 Biomedical Informatics © 2006 Mayo Clinic College of Medicine18 LexGrid Model Coding Scheme Relations Concepts Properties cd codingSchemes describable codingScheme concepts::concepts describable relations::relations describable relations::association relations:: associationInstance associatableElement relations::associationTarget versionableAndDescribable concepts::codedEntry concepts::property concepts::comment concepts::definition concepts::presentation 0..1+concepts0..* +relations 1..* +association 0..*+sourceConcept 0..*+targetConcept 1..*+concept 0..* +property

19 Biomedical Informatics © 2006 Mayo Clinic College of Medicine19 Model: Code Systems Each service defined to the LexGrid model can encapsulate the definition of one or more vocabularies. Each vocabulary is modeled as an individual code system, known as a codingScheme. Each scheme tracks information used to uniquely identify the code system, along with relevant metadata. The collection of all code systems defined to a service is encapsulated by a single codingSchemes container.

20 Biomedical Informatics © 2006 Mayo Clinic College of Medicine20 Model: Concepts A code system may define zero or more coded concepts, encapsulated within a single container. A concept represents a coded entity (identified in the model as a codedEntry ) within a particular domain of discourse. Each concept is unique within the code system that defines it. Must be qualified by at least one term or designation, represented in the model as a property. Each property is an attribute, facet, or some other characteristic that may represent or help define the intended meaning of the encapsulating codedEntry. A concept may be the source for and/or the target of zero or more relationships.

21 Biomedical Informatics © 2006 Mayo Clinic College of Medicine21 Model: Relations Each code system may define one or more containers to encapsulate relationships between concepts. Each named relationship (e.g. “hasSubtype” or “hasPart”) is represented as an association within the LexGrid model. Each relations container must define one or more association. May also further define the nature of the relationship in terms of transitivity, symmetry, reflexivity, forward and inverse names, etc. Multiple instances of each association can be defined, each of which provide a directed relationship between one source and one or more target concepts. Source and target concepts may be contained in the same code system as the association or another if explicitly identified. By default, all source and target concepts are resolved from the code system defining the association. The code system can be overridden by each specific association, relation source ( associationInstance ), or relation target ( associationTarget ).

22 Biomedical Informatics © 2006 Mayo Clinic College of Medicine22 Available Representations of the LexGrid Model The master representation of the LexGrid model is provided in XML Schema Definition (XSD) format. Conversions to other formal representations are available, including XML Metadata Interchange (XMI) and Unified Modeling Language (UML). Implementation or technology-specific renderings of the model also exist. Relational database schema (MySQL, PostgreSQL, DB2, Oracle, etc) Lightweight Directory Access Protocol (LDAP) schema Programming interfaces generated from the formal representation include Java bean interfaces based on the Eclipse Modeling Framework (EMF) and Castor frameworks.

23 Biomedical Informatics © 2006 Mayo Clinic College of Medicine23 Disease Understanding Constrained by Knowledge Carolus Linnaeus Carl von Linné Genera Morborum (1763) Underscored Content Difficulty Pathophysiology vs Manifestation e.g. Rabies as psychiatric disease

24 Biomedical Informatics © 2006 Mayo Clinic College of Medicine24 The Genomic Era The genomic transformation of medicine far exceeds the introduction of antibiotics and aseptic surgery The binding of genomic biology and clinical medicine will accelerate The implications for shared semantics across the basic science and clinical communities are unprecedented The implications for Public Health surveillance and inference are profound

25 Biomedical Informatics © 2006 Mayo Clinic College of Medicine25 From Practice-based Evidence to Evidence-based Practice Patient Encounters Clinical Databases Registries et al. Clinical Guidelines Medical Knowledge Expert Systems Data Inference KnowledgeManagement Decisionsupport Ontology Shared Semantics Vocabularies & Terminologies

26 Biomedical Informatics © 2006 Mayo Clinic College of Medicine26 The Historical Center of the Health Data Universe Clinical Data Billable Diagnoses

27 Biomedical Informatics © 2006 Mayo Clinic College of Medicine27 Copernican Health Data Universe Billable Diagnoses Clinical Data (Niklas Koppernigk) Guidelines Scientific Literature Medical Literature Clinical Data Genomic Characteristics

28 Biomedical Informatics © 2006 Mayo Clinic College of Medicine28 Continuum from Nomenclature to Classification Patient Data is Highly Detailed Modifiers: Anatomy, Stage, Severity, Extent Qualifiers: Probability, Temporal Status Aggregate Uses Require Categorization Granularity of Classifiers Focused Groups and Strata for CQI/Outcomes Broad Statistical/Fiscal Groups

29 Biomedical Informatics © 2006 Mayo Clinic College of Medicine29 Familiar Points Along Continuum Modern Health Vocabularies Nomenclature – Highly Detailed Descriptions (SNOMED) Classification – Organized Aggregation of Descriptions into a Rubric (ICDs) Groupings – High Level Categories of Rubrics (DRGs) Detailed Grouped NomenclatureClassificationGroups

30 Biomedical Informatics © 2006 Mayo Clinic College of Medicine30 Blois, 1988 Medicine and the nature of vertical reasoning Molecular: receptors, enzymes, vitamins, drugs Genes, SNPs, gene regulation Physiologic pathways, regulatory changes Cellular metabolism, interaction, meiosis,… Tissue function, integrity Organ function, pathology Organism (Human), disease Sociology, environment, nutrition, mental health…

31 Biomedical Informatics © 2006 Mayo Clinic College of Medicine31 The Continuum Of Biomedical Informatics Bioinformatics meets Medical Informatics Chasm of Semantic Despair

32 Biomedical Informatics © 2006 Mayo Clinic College of Medicine32 Feudal Cognition Intellectual Semantic Baronies Genetic variation – Genomics Haplotypes – Statistical Genomics Molecular – Metabolomics, Proteomics Binding – Molecular simulation Pathways – Physiology and Systems Biology Symptoms – Consumer Health Rx and Px – Clinical Medicine Risk – Public Health, Epidemiology Social impact – Sociology, Health Economics

33 Biomedical Informatics © 2006 Mayo Clinic College of Medicine33 Molecular Clinical Fine Detail Highly Aggregated Immunology ? Disease Anatomy Pulmonary Disease asthma LungNose pneumonia Nasal Disease allergic rhinitis Airway Nucleotide Molecule Amino Acid Sequence Protein Enzyme Amino Acid TPMTHNMT Thr105Ileallozyme Lysine Immunoglobulin IgE has translation Peptide

34 Biomedical Informatics © 2006 Mayo Clinic College of Medicine34 Aggregation Logics by domain rule-based aggregations Decision Support and Error Detection Public Health and Surveillance Reimbursement and Management Outcome Research and Epidemiology FindingsInterventionsEvents

35 Biomedical Informatics © 2006 Mayo Clinic College of Medicine35 Making Shared Context Explicit CONCEPT Referent Refers To Symbolises Stands For “Rose”, “ClipArt” Refers To Symbolises Stands For “Rose”, “ClipArt” CONCEPT Symbol “I see a ClipArt image of a rose” Context Formal Shared Context Terminologies [From Solbrig]

36 Biomedical Informatics © 2006 Mayo Clinic College of Medicine36 Proliferation of Content “Have it your way” Vocabulary Models Major ontologies SNOMED CT; Gene Ontology; LOINC; NDF-RT UMLS Metathesaurus; NCI Thesaurus HL7 RIM and Vocabulary; DICOM RadLex CDC bioterrorism PHIN standards caBIG DSR / CDEs (Common Data Elements) All created with differing formats and models Mechanisms for content sharing Research Area

37 Biomedical Informatics © 2006 Mayo Clinic College of Medicine37 History of Terminology Services in the US YATN: yet another terminology service 1996 Mayo, Kaiser, Lexical Technology MetaPhrase – Lexical Technology 1998 LQS: Lexicon Querry Services; 3M 1998 Mayo Autocoder: UI to YATN suite 2000 CTS: Common Terminology Services 2003 HL7 balloted standard 2004 LexGrid: superset CTS, ref. implementation – 04 http://informatics.mayo.edu

38 Biomedical Informatics © 2006 Mayo Clinic College of Medicine38 Mayo’s Work with Problem List Interface Design Premise upon Terminology Server MetaPhrase Prototypes on the Network Iterative Usability Lab Evaluations Mock-ups in VB, Delphi, Java, … Evolve Toward Subset of Functional Needs Problem List Specific Drive Specification and Operation of T Server

39 Biomedical Informatics © 2006 Mayo Clinic College of Medicine39 Terminology Services for Humans

40 Biomedical Informatics © 2006 Mayo Clinic College of Medicine40 Common Terminology Services (CTS) An HL7 ANSI standard Defines the minimum set of requirements for interoperability across disparate healthcare applications A specification for accessing terminology content The CTS identifies the minimum set of functional characteristics a terminology resource must possess for use in HL7. A functional model Defining the functional characteristics of vocabulary as a set of Application Programming Interfaces (APIs)

41 Biomedical Informatics © 2006 Mayo Clinic College of Medicine41 CTS APIs Define the necessary functions for healthcare terminology Decouples terminology from the terminology service. Technology independent Legacy database Institutional infrastructure Provide common interface and reference model I know what you mean by Code System Coded Concept Relationship

42 Biomedical Informatics © 2006 Mayo Clinic College of Medicine42 Mayo LexGrid Project Ontology Services HL7 ANSI Standard ISO Standard Open specification Provide consistency and standardization required to support large-scale vocabulary adoption and use Common model, tools, formats, and interfaces Standard terminology model (Excel to OWL) Grid-nodal architecture http://informatics.mayo.edu

43 Biomedical Informatics © 2006 Mayo Clinic College of Medicine43 Examples and Proof of Concept NIH RoadMap: Nat. Center Biomedical Ontologies Mayo LexGrid project [MLG] Clinical and basic science ( Gene Ontology ) communities NCI caBIG – Bioinformatics Grid [MLG] HHS/ONC NHIN National Health Information Network IBM Data Coordination project NLM/HL7 Coordination project; [MLG] CDC PHIN Public Health Information Network [MLG] W3C Semantic Web XML/RDF/OWL ISO 11179 metadata standards [MLG]

44 Biomedical Informatics © 2006 Mayo Clinic College of Medicine44 LexGrid Applications at Mayo for Semantic Annotation and Integration Basis for NLP (Natural Language Processing) entity annotation – clinical notes Harmonize data elements, values sets Getting the data right Information retrieval and navigation Getting the right data Grounding for data governance Foundation for semantic interoperability

45 Biomedical Informatics © 2006 Mayo Clinic College of Medicine45 Cancer Biomedical Informatics Grid (caBIG) Coordinated infrastructure for Cancer Research Clinical Trials, Integrative Cancer Research, Tissue Banking and Pathology Tools Vocabulary, Common Data Elements, Architecture

46 Biomedical Informatics © 2006 Mayo Clinic College of Medicine46 LexBIG Vision

47 Biomedical Informatics © 2006 Mayo Clinic College of Medicine47

48 Biomedical Informatics © 2006 Mayo Clinic College of Medicine48 LexPHIN CDC Public Health Informatics Network Adoption of the LexGrid Model Replace PHIN Vocabulary Services (VS) Addresses genomic characterization of disease Span semantic chasm with Gene Ontology Organized Value Sets Outbreak Management System Biosurveillance and Biosense aggregation

49 Biomedical Informatics © 2006 Mayo Clinic College of Medicine49 LexPHIN Model Concepts Value Domains Coding Scheme Relations Versions versions:: history concepts::concepts versionableAndDescribable valueDomains::valueDomain versionableAndDescribable codingSchemes::codingScheme describable relations::relations valueDomains::valueDomains codingSchemes::codingSchemes describable service::service +history 0..1 +concepts0..1 +valueDomain1..* +codingScheme1..* +relations0..* +valueDomains0..1 +codingSchemes0..1

50 Biomedical Informatics © 2006 Mayo Clinic College of Medicine50 Health Level Seven (HL7) Vocabulary and value domain management Tooling for vocabulary submissions Includes change events for HL7 governance process

51 Biomedical Informatics © 2006 Mayo Clinic College of Medicine51 HL7 Value Domain Editor

52 Biomedical Informatics © 2006 Mayo Clinic College of Medicine52 NCBO – A Bridge Across the Chasm

53 Biomedical Informatics © 2006 Mayo Clinic College of Medicine53 NCBO Tools

54 Biomedical Informatics © 2006 Mayo Clinic College of Medicine54 Ontology List

55 Biomedical Informatics © 2006 Mayo Clinic College of Medicine55 Ontology Counts Total Number of Ontologies 52 NCBO Library45 Remote7 Number of Classes 175296* *ontologies which have been parsed and indexed

56 Biomedical Informatics © 2006 Mayo Clinic College of Medicine56 Ontologies by Category

57 Biomedical Informatics © 2006 Mayo Clinic College of Medicine57 Expanded Categories

58 Biomedical Informatics © 2006 Mayo Clinic College of Medicine58 GO Biological Process Metadata

59 Biomedical Informatics © 2006 Mayo Clinic College of Medicine59 Concept Search

60 Biomedical Informatics © 2006 Mayo Clinic College of Medicine60 Search Results

61 Biomedical Informatics © 2006 Mayo Clinic College of Medicine61 MeSH Results

62 Biomedical Informatics © 2006 Mayo Clinic College of Medicine62 MeSH Hindlimb

63 Biomedical Informatics © 2006 Mayo Clinic College of Medicine63 BioPortal Stanford University Archana Vembakam and Lynn Murphy

64 Biomedical Informatics © 2006 Mayo Clinic College of Medicine64 LexGrid Future Issues Federated vocabulary node synchronization and registration/discovery. API extensions to support local vocabulary extensions and provider suggestions. API extensions to support HL7/CTSII API (currently being defined). API extensions to support submission of vocabulary change requests. API extensions to load and map between additional vocabulary formats. ISO 11179 and LexGrid integration Provide additional index services Synonymy and normalized search Reasoner or classifier adaptation Automated coding of medical records Provide a light-weight Representational State Transfer (REST) service implementation.

65 Biomedical Informatics © 2006 Mayo Clinic College of Medicine65 Conclusion Biomedicine concepts have become complex and intertwined Big science model of future research 21 st Century Medicine will require comparable and consistent data (Clinical and Genomic) Ontologies as formal models of concepts provide great opportunity Tools, content, and resources are becoming increasingly available LexGrid is emerging as an integrating force

66 Biomedical Informatics © 2006 Mayo Clinic College of Medicine66 Resources LexGrid Project http://informatics.mayo.edu/LexGrid http://informatics.mayo.edu/LexGrid LexBIG Forge Site http://gforge.nci.nih.gov/projects/lexbig http://gforge.nci.nih.gov/projects/lexbig caBIG LexGrid CVS http://cabigcvs.nci.nih.gov/viewcvs.cgi/lexgrid http://cabigcvs.nci.nih.gov/viewcvs.cgi/lexgrid NCBO Project http://www.bioontology.org http://www.bioontology.org


Download ppt "Biomedical Informatics The Lexical Grid Project: LexGrid Christopher G. Chute, MD DrPH Professor and Chair, Biomedical Informatics Mayo Clinic College."

Similar presentations


Ads by Google