Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experience with Using the UMLS Semantic Network to Coordinate Controlled Terminologies for a Large Clinical Data Repository James J. Cimino Department.

Similar presentations


Presentation on theme: "Experience with Using the UMLS Semantic Network to Coordinate Controlled Terminologies for a Large Clinical Data Repository James J. Cimino Department."— Presentation transcript:

1 Experience with Using the UMLS Semantic Network to Coordinate Controlled Terminologies for a Large Clinical Data Repository James J. Cimino Department of Biomedical Informatics Columbia University College of Physicians and Surgeons National Library of Medicine, April 8, 2005

2 Overview Background History General principles Empiric observations: Semantic Network in the Medical Entities Dictionary Lessons to be learned

3 Clinical Data Architecture Central repository to collect data from myriad sources Myriad users of data - some not yet imagined

4 New York Presbyterian Hospital Clinical Information Systems Architecture Clinical Database Medical Entities Dictionary Database Monitor Medical Logic Modules Database Interface Research Administrative Alerts & Reminders Results Review... Radiology Laboratory Discharge Summaries Reformatter

5 Clinical Data Architecture Central repository to collect data from myriad sources Myriad users of data - some not yet imagined Patient-oriented, not visit oriented, database Relational, not hierarchical, model Entity-attribute-value model

6 Entity-Attribute-Value Clinical Data Repository

7 Clinical Data Architecture Central repository to collect data from myriad sources Myriad users of data - some not yet imagined Patient-oriented, not visit oriented, database Relational, not hierarchical, model Entity-attribute-value model Coded data wherever possible Unify terminology

8 Medical Entities Dictionary: A Central Terminology Repository

9 MED Structure Medical Entity Laboratory Procedure CHEM-7 Plasma Glucose Test Laboratory Specimen Plasma Specimen Substance Sampled Part of Has Specimen Event Laboratory Test Diagnostic Procedure Substance Measured Glucose Plasma Anatomic Substance Bioactive Substance Chemical Carbo- hydrate

10 Communicating Terminology Changes K#1 K#2 K#3 K#3 = 2.6 K#1 = 4.2 K#1 = 3.3 K#2 = 3.2 K#1 = 3.0

11 K#1 = 4.2 K#1 = 3.3 K#2 = 3.2 K#1 = 3.0 Solution: Hierarchical Integration K#1 K#2 K K#3 K#3 = 2.6

12 Use of the UMLS in Patient Care James J. Cimino, M.D. Center for Medical Informatics Columbia University Mont Pelerin, Switzerland 1994

13 UMLS Semantic Network Strict hierarchy Semantic types: 132 (135) Semantic relations: 46 (53) Inheritance of relations: 6233 (6700)

14 UMLS Metathesaurus Terms from 22 (100+) controlled vocabularies Total source terms: 311,046 Total strings: 279,237 (5,000,000) Total concepts: 152,444 (1,000,000) Relationships: 1,484,994 (16,000,000)

15 Medical Entities Dictionary Semantic Network Sources: 5 Strings: 108,492 Concepts: 35,281 Semantic relations: 23 pairs Semantic Links: 145,672

16 Comparisons - Methods CPMC Entities vs. UMLS Semantic Types MED Classes vs. UMLS Semantic Types MED Semantic Links vs. UMLS Semantic Relations MED Concepts vs. Metathesaurus Concepts MED Semantic Links vs. Meta Relations

17 Comparisons - Results DB EntitiesClassesLinks CPMC UMLSUMLS Types Relations Concepts Meta Links ++++ +++ +/- ++ +++

18 Summary Semantic Types provide good coverage Concepts provide good coverage in certain domains No technical reason why UMLS could not incorporate clinical vocabulary

19 Where We Are Today - Repository Patients: 2.6 million Visits: >10 million since 1996 with archives going back to 1979 Visit diagnoses, locations, procedures, providers, insurance Lab procedures: 16 million with 130 million results (to 1989) Radiology procedures reports: 5.7 million Pathology: 1.4 million Cardiology procedures: 1.5 million Resident signout notes:760,000 Operative Notes: 426,000 Clinical Notes: 400,000 Discharge Summaries: 420000 Medication orders: >60 million ObGyn Procedure Reports: 241,000 GI Procedure Reports: 101,000 Neurology Procedure Reports: 54,000 Ideatel BP’s: 215,000 Ideatel Glucose: 650,000 Consult Events: 18000 HEENT Events:13000 Hospitalist Notes:30000 PFT: 25000 Provider profiles 11000 IDX 1.4 million East Campus

20 Where We Are Today - MED Domains: 7++ (5) –HP lab terms –Misys lab terms –Cerner lab terms –Misys Radiology –Digimedix drugs –Cerner Drugs –ICD9-based problem list terms –Other applications –Knowledge terms Size: –Concept-based: 95,641 (35,281) –Multiple hierarchy: 141,306 –Synonyms: 239,581 (108,492) –Translations: 141,717 –Semantic link pairs: 52 (23) –Semantic links: 225,698 (145,672) –Attributes: 210,456

21 What does this have to do with the SN? MED was initially based on UMLS design (creationism) UMLS SN was the “starter set” MED is “local UMLS” for CPMC General principles were established MED has developed without further conscious attention to the SN (evolution) MED content represents real-world terminology What follows are empiric observations, open to criticism; perhaps indefensible

22 General Principles Everything is a class Multiple hierarchy Some relations are definitional At most, one part of relation pair is definitional Properties introduced at single points

23 Observations on the SN in the MED Arrangement of SN in MED Multiple hierarchy of STs Size of ST classes in MED (vs Meta?) STs as introduction points Intersections

24 UMLS Semantic Net in the MED A: T071: Medical Entity [94729]. A1: T072: Physical Object [5618]. +*A1.2: T017: Anatomical Structure [577]. A2: T077: Conceptual Entity [77861]. *B: T051: Event [55450] Key: “A1.2”: UMLS Tree address “T071”: Semantic type ID “Event”: MED Name “+”: Multiple locations “*”: Discontinuous tree address “[577]”: Number of MED concepts

25 UMLS Semantic Net in the MED A: T071: Medical Entity [94729]. A1: T072: Physical Object [5618].. A1.1: T001: Organism [3153]... A1.1.1: T002: Plant [1].... A1.1.1.1: T003: Alga [0]... A1.1.2: T004: Fungus [273]... A1.1.3: T005: Virus [169]... A1.1.4: T006: Rickettsia or Chlamydia [5]... A1.1.5: T007: Bacterium [992]... A1.1.6: T194: Archaeon [0]... A1.1.7: T008: Animal [93].... A1.1.7.1: T009: Invertebrate [85].... A1.1.7.2: T010: Vertebrate [6]..... A1.1.7.2.1: T011: Amphibian [0]..... A1.1.7.2.2: T012: Bird [0]..... A1.1.7.2.3: T013: Fish [0]..... A1.1.7.2.4: T014: Reptile [0]..... A1.1.7.2.5: T015: Mammal [1]...... A1.1.7.2.5.1: T016: Human [0] Key: “A1.2”: UMLS Tree address “T071”: Semantic type ID “Event”: MED Name “+”: Multiple locations “*”: Discontinuous tree address “[577]”: Number of MED concepts

26 UMLS Semantic Net in the MED A: T071: Medical Entity [94729]. +*A1.2: T017: Anatomical Structure [577].. A1.2.3: T021: Fully Formed Anatomical Structure [230]... A1.2.3.1: T023: Body Part, Organ, or Organ Component [204]... *A1.2.1: T018: Embryonic Structure [2]... *A1.2.2: T190: Anatomical Abnormality [20].... A1.2.2.1: T019: Congenital Abnormality [0].... A1.2.2.2: T020: Acquired Abnormality [18].. *A1.2.3.2: T024: Tissue [66].. *A1.2.3.3: T025: Cell [61].. *A1.2.3.4: T026: Cell Component [11].. *A1.2.3.5: T028: Gene or Genome [0].. *A1.4.2: T031: Body Substance [56].. +*A2.1.4.1: T022: Body System [65].. +*A2.1.5.1: T030: Body Space or Junction [43].. +*A2.1.5.2: T029: Body Location or Region [117.. *A1.3: T073: Manufactured Object [16]... A1.3.1: T074: Medical Device [6]... A1.3.2: T075: Research Device [0]... A1.3.3: T200: Clinical Drug [0].. A1.4: T167: Substance [???]... A1.4.1: T103: Chemical [1942].... A1.4.1.1: T120: Chemical Viewed Functionally [1828]..... A1.4.1.1.1: T121: Pharmacologic Substance [1468]...... +*A1.4.1.1.3.4: T127: Vitamin [20]...... A1.4.1.1.1.1: T195: Antibiotic [130]..... A1.4.1.1.3: T123: Biologically Active Substance [530]...... +A1.4.1.1.3.4: T127: Vitamin [20] Key: “A1.2”: UMLS Tree address “T071”: Semantic type ID “Event”: MED Name “+”: Multiple locations “*”: Discontinuous tree address “[577]”: Number of MED concepts

27 1: Medical Entirity [T071] MED-CODE UMLS-CODE NAME SUBCLASS-OF -> SUBCLASS (1: Medical Entity [T071]) SUBCLASS -> SUBCLASS-OF (1: Medical Entity [T071]) SYNONYMS PRINT-NAME HAS-PARTS -> PART-OF (1: Medical Entity [T071]) PART-OF -> HAS-PARTS (1: Medical Entity [T071]) DEFINITION MAIN-MESH SUPPLEMENTARY-MESH NAME-TOKEN DEFAULT-SHORT-DISPLAY-NAME DEFAULT-DISPLAY-NAME SPEECH-SYNONYM SPEECH-SYNTHESIS-NAME ENTITY-(HAS-RELATED)-PAGER-NUMBER ENTITY-(HAS)-MEDLEE-TARGET-TERM HIERARCHY-SELECTOR Property Introduction Points

28 7: Body System [T022] ACTION-SITE-OF -> ACTION-SITE (98: Health Care Activity (Procedure) [T058]) 14: Anatomical Structure [T017] SITE-OF-PROBLEM -> HAS-PROBLEM-SITE (30007: Patient Problem) OBSERVATION-SITE-OF -> OBSERVATION-SITE (94: Diagnostic Procedure [T060]) 43: Chemical [T103] PHARMACEUTIC-COMPONENT-OF -> PHARMACEUTIC- COMPONENT (28103: Pharmacy Items (Drugs and Nondrugs)) 50: Measureable Entity MEASURED-BY-PROCEDURE -> ENTITY-MEASURED (64964: Assessment Procedures) LOINC-ANALYTE-NAME 76: Disease or Syndrome [C0391828] ETIOLOGY -> CAUSES-DISEASES (135: Etiologic Agent) IS-HISTORIC-DISEASE-FOR -> HISTORIC-DISEASE (56164: Factors Related to Past Disease Influencing Health Status) Medical Properties

29 83: Laboratory Finding or Test Result [T034] RESULT-TYPE-->TESTS -> TEST-->RESULT-TYPE (94: Diagnostic Procedure [T060]) 86: Finding [T033] FINDING-(REFERS-TO)->ORGANISM 93: Laboratory Diagnostic Procedure COLLECTED-BY -> COLLECTED-FOR (33023: Specimen Collection [C0200345]) 94: Diagnostic Procedure [T060] UNITS TEST-->RESULT-TYPE -> RESULT-TYPE-->TESTS (83: Laboratory Finding or Test Result [T034]) OBSERVATION-SITE -> OBSERVATION-SITE-OF (14: Anatomical Structure [T017]) TEST-(HAS)-ABNORMAL-FLAG -> ABNORMAL-FLAG-(FOR)-TEST (77746: Abnormal Flag Value) 98: Health Care Activity (Procedure) [T058] PROCEDURE-(INDICATES)->PT-PROBLEM -> PT-PROBLEM- (INDICATED-BY)->PROCEDURE (30007: Patient Problem) ACTION-SITE -> ACTION-SITE-OF (7: Body System [T022]) Medical Properties

30 135: Etiologic Agent CAUSES-DISEASES -> ETIOLOGY (76: Disease or Syndrome [C0391828]) 1181: Antibiotic Sensitivity Tests SENSITIVITY-ANALYTE -> SENSITIVITY-ANALYTE-OF (44440: Antibiotic or Bacterial Enzyme Inhibitor) 32291: Sampleable Entity SAMPLED-BY -> SYSTEM-SAMPLED (64970: Sample Entity) LOINC-SYSTEM-CODE 44440: Antibiotic or Bacterial Enzyme Inhibitor SENSITIVITY-ANALYTE-OF -> SENSITIVITY-ANALYTE (1181: Antibiotic Sensitivity Tests) Medical Properties

31 59511: Clinical Repository Table TABLE-HAS-COLUMN -> COLUMN-IS-IN-TABLE (59512: Clinical Repository Column) 59512: Clinical Repository Column COLUMN-IS-IN-TABLE -> TABLE-HAS-COLUMN (59511: Clinical Repository Table) 59528: Generic Column COLUMN-HAS-PERMITTED-VALUES -> IS-PERMITTED-VALUE-FOR- COLUMN (67164: Verification Concept for Generic Column) 59729: Data Entry Form Component REPEAT-TYPE(DATA-ENTRY-COMPONENT) NUMBER-REPEATS(DATA-ENTRY-COMPONENT) REPEAT-LAYOUT-TYPE(DATA-ENTRY-COMPONENT) 59732: Form Field Allowable Values ALLOWABLE-VALUE-(FOR)->DATA-ENTRY-FIELD -> DATA-ENTRY- FIELD-(HAS)->ALLOWABLE-VALUE (42646: Data Entry Form Field) Data Dictionary Properties

32 21762: ICD9 Element ICD9-CODE ICD9-ENTRY-CODE OLD-ICD9-CODE ICD9-NAME 23147: American Hospital Formulary Service Class AHFS-CLASS-CODE 28104: Drug Enforcement Administration (DEA) Controlled Substance Category DEA-CODE Controlled Terminology Properties

33 1178: Number or String Result EVENT-ID-OF -> EVENT-ID (9876: CPMC Event) EVENT-PATIENT-ID-OF -> EVENT-PATIENT-ID (9876: CPMC Event) EVENT-ORGANIZATION-OF -> EVENT-ORGANIZATION (9876: CPMC Event) EVENT-LOCATION-OF -> EVENT-LOCATION (9876: CPMC Event) PARTICIPANT-ID-OF -> PARTICIPANT-ID (30352: Medical Event Participant) 9876: CPMC Event EVENT-ID -> EVENT-ID-OF (1178: Number or String Result) EVENT-DATE -> EVENT-DATE-OF (30349: Date Result) EVENT-PATIENT-ID -> EVENT-PATIENT-ID-OF (1178: Number or String Result) EVENT-PARTICIPANT -> PARTICIPANT-OF (30352: Medical Event Participant) EVENT-ORGANIZATION -> EVENT-ORGANIZATION-OF (1178: Number or String Result) EVENT-LOCATION -> EVENT-LOCATION-OF (1178: Number or String Result) EVENT-STATUS -> STATUS-OF (30355: CPMC Status Term) EVENT-(HAS)-ORGANIZATION -> ORGANIZATION-(FOR)-EVENT (81475: CPMC Coded Organizations) 30344: CPMC Order ORDER-QUANTITY -> ORDER-QUANTITY-OF (30350: Quantity Result) ORDER-FREQUENCY -> ORDER-FREQUENCY-OF (32504: Order Frequency) ORDER-START-DATE -> ORDER-START-DATE-OF (30349: Date Result) ORDER-STOP-DATE -> ORDER-STOP-DATE-OF (30349: Date Result) 30352: Medical Event Participant PARTICIPANT-OF -> EVENT-PARTICIPANT (9876: CPMC Event) PARTICIPANT-ID -> PARTICIPANT-ID-OF (1178: Number or String Result) PARTICIPANT-NAME -> PARTICIPANT-NAME-OF (32653: ID Number Plus Text Result) Data Modeling Properties

34 40441: Display Information [C0010996] DEFAULT-DISPLAY-FOR -> HAS-DEFAULT-DISPLAYS (94: Diagnostic Procedure [T060]) DISPLAYS-ELEMENTS-OF -> ELEMENTS-DISPLAYED-BY (94: Diagnostic Procedure [T060]) HAS-DISPLAY-PARAMETERS -> IS-DISPLAY-PARAMETER-OF (94: Diagnostic Procedure [T060]) DISPLAY-PARAMETER-ORDER Application Properties

35 42645: Data Entry Form FORM-(IS-PART-OF)->FORMSET -> FORMSET-(CONTAINS)->FORM (66436: Data Entry Form Sets) 42646: Data Entry Form Field DATA-ENTRY-FIELD-(HAS)->ALLOWABLE-VALUE -> ALLOWABLE-VALUE- (FOR)->DATA-ENTRY-FIELD (59732: Form Field Allowable Values) FORM-FIELD-(HAS)->FIELD-TYPE -> FIELD-TYPE-(FOR)->FORM-FIELD (66295: Data Entry Field Type) FORM-FIELD-(OBEYS)->PREFILL-RULE -> PREFILL-RULE-(FOR)->FORM- FIELD (66311: Prefill Rules) FORM-FIELD-MAXIMUM-VALUE FORM-FIELD-MINIMUM-VALUE FORM-FIELD-MAXIMUM-CHARACTER-COUNT 59732: Form Field Allowable Values ALLOWABLE-VALUE-(FOR)->DATA-ENTRY-FIELD -> DATA-ENTRY-FIELD- (HAS)->ALLOWABLE-VALUE (42646: Data Entry Form Field) 66295: Data Entry Field Type FIELD-TYPE-(FOR)->FORM-FIELD -> FORM-FIELD-(HAS)->FIELD-TYPE (42646: Data Entry Form Field) 66308: Layout Type LAYOUT-TYPE-(FOR)->FORM-STRUCTURE -> FORM-STRUCTURE-(HAS)- >LAYOUT-TYPE (66405: Data Entry Form Structure) Document Properties

36 Chemical [T103] Measureable Entity Etiologic Agent 1780 cases. Measureable Entity Laboratory Finding or Test Result [T034] Finding [T033] Etiologic Agent Microbiology Result Patient Problem Laboratory Results 1399 cases. Laboratory Finding or Test Result [T034] Finding [T033] Patient Problem Laboratory Results 3309 cases. Laboratory Finding or Test Result [T034] Finding [T033] Patient Problem Laboratory Results New York Hospital (NYH) Laboratory Nomenclature Term 1601 cases. 207 Intersection Classes

37 Laboratory Finding or Test Result [T034] Finding [T033] Patient Problem New York Hospital (NYH) Laboratory Nomenclature Term 2906 cases. Laboratory Diagnostic Procedure Diagnostic Procedure [T060] Health Care Activity (Procedure) [T058] Event [T051] Laboratory Diagnostic Batteries Single-Result Laboratory Test New York Hospital (NYH) Laboratory Concept Assessment Procedures 1197 cases. Laboratory Diagnostic Procedure Diagnostic Procedure [T060] Health Care Activity (Procedure) [T058] Event [T051] Laboratory Diagnostic Batteries New York Hospital (NYH) Laboratory Concept Assessment Procedures 1822 cases. 207 Intersection Classes

38 Laboratory Diagnostic Procedure Diagnostic Procedure [T060] Health Care Activity (Procedure) [T058] Event [T051] Single-Result Laboratory Test New York Hospital (NYH) Laboratory Concept Assessment Procedures 3200 cases. Laboratory Diagnostic Procedure Diagnostic Procedure [T060] Health Care Activity (Procedure) [T058] Event [T051] Single-Result Laboratory Test CPMC Single-Result Laboratory Test Assessment Procedures 3197 cases. Health Care Activity (Procedure) [T058] Event [T051] ICD9 Element Verification Concept for Generic Column 10048 cases. 207 Intersection Classes

39 Revisiting Recommendations - General Make “Event” a temporal concept Conceptual vs. Physical polarization Directed Acyclic Graph Merge Network and Metathesaurus

40 Revisiting Recommendations - Specific Tests have Specimens Tests have Parts Separate Medications from Chemicals Liberalize assignment of Relations

41 Revisiting Summary Semantic Types provide good coverage Concepts provide good coverage in certain domains No technical reason why UMLS could not incorporate clinical vocabulary

42 Lessons to be Learned The MED is representative of clinical care MED classes work well as introduction points Multiple hierarchy works Semantic Network is largely intact Unifying organization for anatomy needed Further study of MED will suggest additional types and relations


Download ppt "Experience with Using the UMLS Semantic Network to Coordinate Controlled Terminologies for a Large Clinical Data Repository James J. Cimino Department."

Similar presentations


Ads by Google