Cornerstone I: Representing Knowledge From Data to Knowledge Through Concept-Oriented Terminologies James J. Cimino
The first step on the path to knowledge is getting things by their right names. -Chinese saying
Overview What is “data to knowledge”? Knowledge representation choices Knowledge-based terminology efforts Medical Entities Dictionary Proof of concepts
What is “data to knowledge”? Start with patient data in the medical record Enhance knowledge by: –gaining a better understanding of the patient –learning relevant knowledge –bringing smart systems to bear to apply knowledge –discovering new knowledge from health data
Knowledge Representation Terminology for representing symbols Format for arranging the symbols
Knowledge Representation Choices Guideline implementation
Guideline Implementation Starren and Xie, SCAMC, 1994 National Cholesterol Education Panel Guideline
Measure Cholesterol & Assess Risk Factors Cholesterol 200 to 239 Cholesterol <200 Cholesterol >239 HDL >35, <2 Risks HDL <35 or 2 Risks Provide dietary information Reevaluate in 2 years Cholesterol 200 to 239 HDL >35, <2 Risks
Guideline Implementation Starren and Xie, SCAMC, 1994 National Cholesterol Education Panel Guideline Three representations: –PROLOG (first-order logic)
NCEP Guideline in PROLOG rule_j(PID):- check_lab(PID,hdl,HDL,_),!, HDL >= 35, total_risk(PID,Risk),!, Risk < 2, check_lab(PID,cholesterol), C,_), C >= 200, C =< 239, print_rule_j.
Guideline Implementation Starren and Xie, SCAMC, 1994 National Cholesterol Education Panel Guideline Three representations: –PROLOG (first-order logic) –CLASSIC (frames)
NCEP Guideline in CLASSIC (CL-DEFINE-CONCEPT ‘C-PATIENT ‘(AND (ALL CHOL (AND INTEGER (MIN 200) (MAX 239))))) (CL-DEFINE-CONCEPT ‘G-PATIENT ‘(AND C-PATIENT LOW-RISK-PATIENT (ALL HDL (AND INTEGER (MIN 35)))))
Guideline Implementation Starren and Xie, SCAMC, 1994 National Cholesterol Education Panel Guideline Three representations: –PROLOG (first-order logic) –CLASSIC (frames) –CLIPS (production rules)
NCEP Guideline in CLIPS (defrule C2G2J “Rules to reach box J” ?f1 <- (calculated-patient (state c) (done no) (hdl ?hdl) (name ?name) (test (>= ?hdl 35)) => (printout “Patient “ ?name “needs treatment”)
Guideline Implementation Starren and Xie, SCAMC, 1994 National Cholesterol Education Panel Guideline Three representations: –PROLOG (first-order logic) –CLASSIC (frames) –CLIPS (production rules) “All three representations proved adequate for encoding the guideline”
Knowledge Representation Choices Guideline implementation Terminologic knowledge
Terminology Representation Choices Frame-based
Frame-Based Representation Serum Glucose Test is-a:Lab Test Measures:Glucose Specimen:Serum Units:“mg/dl”
Terminology Representation Choices Frame-based Terminology Representation Choices Semantic network
Semantic Network Representation Serum Glucose Test Chemical is-a Lab Test is-a Body Substance is-a Serum Glucose
Terminology Representation Choices Frame-based Semantic network Terminology Representation Choices Conceptual graphs
Conceptual Graph Representation [Serum Glucose Test] - (is-a) -> [Lab Test] (measures) -> [Glucose] (specimen) -> [Serum]
Terminology Representation Choices Frame-based Semantic network Conceptual graphs Terminology Representation Choices
Knowledge Representation Choices Guideline implementation Terminologic knowledge
Knowledge Representation Terminology for representing symbols Format for arranging the symbols Terminology and format for representing terminologic knowledge
Knowledge-Based Terminology Efforts Jochen Bernauer, SCAMC, 1991
Conceptual graphs to model findings increased_uptake site femur site_attr right during bone_phase
Knowledge-Based Terminology Efforts Jochen Bernauer, SCAMC, 1991 Rector, Nolan and Glowinski, SCAMC, 1993
GALEN project conditions grammatically haveLocation bodyparts fractures sensibly haveLocation bones femurs sensiblyAndNecessarily haveDivision neck
Knowledge-Based Terminology Efforts Jochen Bernauer, SCAMC, 1991 Rector, Nolan and Glowinski, SCAMC, 1993 Campbell and Musen, SCAMC, 1993
Conceptual graphs and SNOMED Pain + Chest + Radiation to + Left + Arm (located in) -> [Chest] (radiating to) -> [Arm] -> (with laterality) -> [Left] [Pain] -
Knowledge-Based Terminology Efforts Jochen Bernauer, SCAMC, 1991 Rector, Nolan and Glowinski, SCAMC, 1993 Campbell and Musen, SCAMC, 1993 Lindberg, Humphreys, McCray, Methods 1993
Unified Medical Language System Lexical group String Concept String Lexical group
Knowledge-Based Terminology Efforts Jochen Bernauer, SCAMC, 1991 Rector, Nolan and Glowinski, SCAMC, 1993 Campbell and Musen, SCAMC, 1993 Lindberg, Humphreys, McCray, Methods 1993 Rocha, Huff, et al., CBM, 1994
VOSER A server architecture for managing terminologic knowledege
Knowledge-Based Terminology Efforts Jochen Bernauer, SCAMC, 1991 Rector, Nolan and Glowinski, SCAMC, 1993 Campbell and Musen, SCAMC, 1993 Lindberg, Humphreys, McCray, Methods 1993 Rocha, Huff, et al., CBM, 1994 Campbell, Cohn, Chute, et al., SCAMC 1996
Convergent Medical Terminology SNOMED/Kaiser/Mayo Galapagos
Knowledge-Based Terminology Efforts Jochen Bernauer, SCAMC, 1991 Rector, Nolan and Glowinski, SCAMC, 1993 Campbell and Musen, SCAMC, 1993 Lindberg, Humphreys, McCray, Methods 1993 Rocha, Huff, et al., CBM, 1994 Campbell, Cohn, Chute, et al., SCAMC 1996 Brown, O’Neil and Price, Methods, 1997
Read Codes Representation with GALEN model
Knowledge-Based Terminology Efforts Jochen Bernauer, SCAMC, 1991 Rector, Nolan and Glowinski, SCAMC, 1993 Campbell and Musen, SCAMC, 1993 Lindberg, Humphreys, McCray, Methods 1993 Rocha, Huff, et al., CBM, 1994 Campbell, Cohn, Chute, et al., SCAMC 1996 Brown, O’Neil and Price, Methods, 1997 Spackman, Campbell, and Côte, SCAMC 1997
SNOMED RT (Reference Terminology) Convergent Medical Terminology Description Logic Format
Knowledge-Based Terminology Efforts Jochen Bernauer, SCAMC, 1991 Rector, Nolan and Glowinski, SCAMC, 1993 Campbell and Musen, SCAMC, 1993 Lindberg, Humphreys, McCray, Methods 1993 Rocha, Huff, et al., CBM, 1994 Campbell, Cohn, Chute, et al., SCAMC 1996 Brown, O’Neil and Price, Methods, 1997 Spackman, Campbell, and Côte, SCAMC 1997 Huff, Rocha, McDonald, et al., JAMIA 1998
Logical Observations, Identfiers, Names and Codes (LOINC) | GLUCOSE^3H POST 100 G GLUCOSE PO | SCNC | PT | SER/PLAS | QN|
Knowledge-Based Terminology Efforts Jochen Bernauer, SCAMC, 1991 Rector, Nolan and Glowinski, SCAMC, 1993 Campbell and Musen, SCAMC, 1993 Lindberg, Humphreys, McCray, Methods 1993 Rocha, Huff, et al., CBM, 1994 Campbell, Cohn, Chute, et al., SCAMC 1996 Brown, O’Neil and Price, Methods, 1997 Spackman, Campbell, and Côte, SCAMC 1997 Huff, Rocha, McDonald, et al., JAMIA 1998 Pharmacy system knowledge base vendors
Pharmacy System Knowledge Base Vendors Manufactured Components Country-Specific Packaged Product Ingredient Ingredient Class is-a Drug Class is-a Not-Fully-Specified Drug is-a Clinical Drug is-a Trademark Drug is-a International Package Identifiers is-a Composite Trademark Drug Composite Clinical Drug is-a
Knowledge-Based Terminology Efforts Jochen Bernauer, SCAMC, 1991 Rector, Nolan and Glowinski, SCAMC, 1993 Campbell and Musen, SCAMC, 1993 Lindberg, Humphreys, McCray, Methods 1993 Rocha, Huff, et al., CBM, 1994 Campbell, Cohn, Chute, et al., SCAMC 1996 Brown, O’Neil and Price, Methods, 1997 Spackman, Campbell, and Côte, SCAMC 1997 Huff, Rocha, McDonald, et al., JAMIA 1998 Pharmacy system knowledge base vendors
Medical Entities Dictionary (MED) New York Presbyterian Hospital 60,000 concepts (procs, results, drugs, probs) 208,242 synonyms 84,677 hierarchical links 113,906 semantic links 238,040 other attributes 66,404 translations (ICD9-CM, LOINC, MeSH, UMLS)
Central Controlled Terminology
MED Data Structures Semantic network
MED Semantic Network Medical Entity Plasma Glucose Laboratory Specimen Plasma Specimen Anatomic Substance Plasma Substance Sampled Part of Has Specimen Substance Measured Laboratory Procedure CHEM-7 Laboratory Test Event Diagnostic Procedure Substance Bioactive Substance Glucose Chemical Carbo- hydrate
MED Data Structures Semantic network MUMPS global
MED MUMPS Global ^med(1600) ^med(1600,1)..,4)..,5) <>..,6)..,7) <>..,8)..,12)..,14)..,16)..,17)..,20)..,23)..,50)..,138)..,156)..,161)
MED Data Structures Semantic network MUMPS global DB2
MED DB2 Tables Entities 10 Name 20 UMLS 30 Part-of 40 Specimen Slots Entity-Slots 1 10 Entity 2 10 C mg/dl Entity/Slot/Values Ancestry
MED Data Structures Semantic network MUMPS global DB2 Unix
MED UNIX Data Structure 1600|SERUM GLUCOSE MEASUREMENT |1|C020241|4|32703|4|50000|12|GL UC|17|mg/dl|
MED Data Structures Semantic network MUMPS global DB2 UNIX
Proof of Concepts Merging data and application knowledge
Merging Data and Application Knowledge Plasma Glucose Test Serum Glucose TestFingerstick Glucose Test Lab Test Intravascular Glucose Test Chem20 Display Lab Display Class-based, reusable lab summaries
DOP Summary
WebCIS Summary
Merging Data and Application Knowledge Plasma Glucose Test Serum Glucose TestFingerstick Glucose Test Lab Test Intravascular Glucose Test Chem20 Display Lab Display Class-based, reusable lab summaries Expert system for application maintenance
Proof of Concepts Merging data and application knowledge Smarter retrievals from the record
Smarter Retrievals from the Record Repository stores events and results Clinical problems at a different level of granularity Re-use knowledge to map from problems to clinical data Produce problem-specific views of the medical record
Chest X ray Congestive Heart Failure Intravascular CK Test Creatine Kinase Chest X ray 2 View Cardiac Enzyme Angina Lab :1/1/99 Cardiac Enzyme Test Radiology :2/23/99 Chest X Ray Radiology :2/28/96 Head CT Lab :12/28/96 Sickle Cell Test Admission :3/14/96 Stroke Admission :2/14/98 Angina Lab :1/1/99 Blood Type Test Radiology :2/1/97 Knee X Ray Concept-oriented (Heart) Heart Disease Chest Discharge :1/15/99 CHF CHF Discharge :1/15/99 CHF CHF Admission :2/14/98 Angina Lab :1/1/99 Cardiac Enzyme Test Radiology :2/23/99 Chest X Ray
Proof of Concepts Merging data and application knowledge Smarter retrievals from the record “Just-in-Time” education
“Just-in-time” Education Medline button Infobuttons
“Just-in-time” Education Medline button Infobuttons Text-to-Web
Medline button Infobuttons Text-to-Web “Just-in-time” Education DXplain Medline Cholesterol Guideline Dietary Interactions PDR Micromedex Clinical Info System Webpath CHORUS Radiol Museum of South Bank Laboratory Test Results Medication Orders X-ray Reports ICD9
Proof of Concepts Merging data and application knowledge Smarter retrievals from the record “Just-in-Time” education Expert systems
Expert Systems Hripcsak, et al., Ann. Int. Med., 1995
Identify chest x-ray reports suspicious for 6 clinical conditions to trigger alerts MethodSensSpec Laypersons22-47%97-99% Radiologists73-98%96-99% Internists68-98%97-99% Keyword51-79%79-92% NLP/MED/Rule-based 81% 98%
Expert Systems Hripcsak, et al., Ann. Int. Med., 1995 Clinical decision support system
Clinical Decision Support System Data monitor runs rules against incoming reports Tuberculosis cultures come back 4-8 weeks later One day, hundreds of TB alerts came in
What Happened to the Tuberculosis Alert? No Growth Medical Logic Module No Growth to Date
No Growth after... How We Outsmarted the Lab No Growth No Growth after 48 Hours No Growth after 72 Hours “No Growth” Results No Growth after 24 Hours No Growth to Date Medical Logic Module
Expert Systems Hripcsak, et al., Ann. Int. Med., 1995 Clinical decision support system DXplain Button
Elhanan, et al., SCAMC 1997 Convert of test results to clinical findings Pass findings to DXplain Cholesterol Hypercholesterolemia Abnormalities of Serum Cholesterol Serum Serum Specimen Serum Cholesterol Test
Expert Systems Hripcsak, et al., Ann. Int. Med., 1995 Clinical decision support system DXplain Button
Proof of Concepts Merging data and application knowledge Smarter retrievals from the record “Just-in-Time” education Expert systems Data mining
Data Mining Wilcox and Hripcsak, SCAMC 1997
Data Mining Wilcox and Hripcsak, SCAMC 1997 Wilcox and Hripcsak, SCAMC 1998
Compare traditional coding methods with NLP to identify conditions in a set of patient records (x-ray reports) MethodSensSpec Laypersons 36% 86% Expert-coded cases27-37%95-98% ICD-9-coded cases12-29%86-90% Physicians 85% 98% NLP/MED/Rule-based 81% 98% Wilcox and Hripcsak, SCAMC 1998
Data Mining Wilcox and Hripcsak, SCAMC 1997 Wilcox and Hripcsak, SCAMC 1998
Proof of Concepts Merging data and application knowledge Smarter retrievals from the record “Just-in-Time” education Expert systems Data mining Database maintenance and use
Database Maintenance and Use Tables, columns, events all modeled in the MED Allows linkage of data model to controlled terminology Terminologies can be reused Impact of terminology changes on data model can be tracked
Proof of Concepts Merging data and application knowledge Smarter retrievals from the record “Just-in-Time” education Expert systems Data mining Database maintenance and use Terminology maintenance and use
Terminology Maintenance and Use Integrating terminologies from merging hospitals Automated update of medication terminology Detection of errors and inconsistencies
Proof of Concepts Merging data and application knowledge Smarter retrievals from the record “Just-in-Time” education Expert systems Data mining Database maintenance and use Terminology maintenance and use
Is it Worth the Trouble? Meed: noun 1 archaic : an earned reward or wage 2 : a fitting return or recompense Date: before 12th century Etymology: from Old English: MED
Summary Putting knowledge in your terminology gets you: –Better ways to get knowledge out of your EMR –Better ways to get knowledge out of resources –Better ways to use other knowledge bases –Bettter ways to use terminology –Better ways to manage applications –Better ways to manage data and terminology Representation scheme is less important Desiderata for controlled terminology
Desiderata Desirable qualities for terminology
Desiderata Desirable qualities for terminology “Go placidly amid the noise and haste, and remember what peace there may be in silence.” “I’d rather be sailing”