Quality Assurance of the Content of a Large DL-based Terminology using Mixed Lexical and Semantic Criteria: Experience with SNOMED CT Alan Rector, Luigi.

Slides:



Advertisements
Similar presentations
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Advertisements

Copyright © Cengage Learning. All rights reserved.
Chapter 7 System Models.
Chapter 24 Quality Management.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Semantic Interoperability in Health Informatics: Lessons Learned 10 January 2008Semantic Interoperability in Health Informatics: Lessons Learned 1 Medical.
The CODS Protégé Server. Goals 3 Collaborative Ontology Development Approaches Browse with limited Edit Version Control (analogous to cvs, svn) But should.
No 1 IT Governance – how to get the right and secured IT services Bjorn Undall and Bengt E W Andersson The Swedish National Audit Office Oman
We need a common denominator to add these fractions.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Determine Eligibility Chapter 4. Determine Eligibility 4-2 Objectives Search for Customer on database Enter application signed date and eligibility determination.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
Addition Facts
Fourth normal form: 4NF 1. 2 Normal forms desirable forms for relations in DB design eliminate redundancies avoid update anomalies enforce integrity constraints.
1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.
Data recovery 1. 2 Recovery - introduction recovery restoring a system, after an error or failure, to a state that was previously known as correct have.
Conceptual / semantic modelling
1 Term 2, 2004, Lecture 2, Normalisation - IntroductionMarian Ursu, Department of Computing, Goldsmiths College Normalisation Introduction.
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
Fawaz Ghali Web 2.0 for the Adaptive Web.
Reductions Complexity ©D.Moshkovitz.
Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,
Solve Multi-step Equations
Configuration management
Chapter 11: Models of Computation
OOAD – Dr. A. Alghamdi Mastering Object-Oriented Analysis and Design with UML Module 3: Requirements Overview Module 3 - Requirements Overview.
ABC Technology Project
Columbus State Community College
Copyright © 2013, 2009, 2005 Pearson Education, Inc.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
Module 14: Blood Collection and Handling Dried Blood Spot
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Building Your Personal Brand
1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.
Squares and Square Root WALK. Solve each problem REVIEW:
1 ISWC-2003 Sanibel Island, FL IMG, University of Manchester Jeff Z. Pan 1 and Ian Horrocks 1,2 {pan | 1 Information Management.
© 2008 Security Compass inc. 1 Firefox Plug-ins for Application Penetration Testing Exploit-Me.
© 2012 National Heart Foundation of Australia. Slide 2.
 Implementing terminology requires supporting tools  Tools required are highly dependant on the type of implementation  Covered in this presentation.
Lecture 8: Testing, Verification and Validation
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Who are the Experts?Simon KampaSlide 1 Who are the Experts? Simon Kampa IAM Group University of Southampton
Addition 1’s to 20.
25 seconds left…...
Slippery Slope
Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques
Week 1.
We will resume in: 25 Minutes.
Depth-First and Breadth-First Search CS 5010 Program Design Paradigms “Bootcamp” Lesson 9.2 TexPoint fonts used in EMF. Read the TexPoint manual before.
PSSA Preparation.
Chapter 11 Describing Process Specifications and Structured Decisions
What is Logic? Forget everything you think you know about logic. Forget everything everything you have ever read about logic. Logic is not the same as.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 14: Protection.
From Approximative Kernelization to High Fidelity Reductions joint with Michael Fellows Ariel Kulik Frances Rosamond Technion Charles Darwin Univ. Hadas.
From Model-based to Model-driven Design of User Interfaces.
A Case Study of ICD-11 Anatomy Value Set Extraction from SNOMED CT Guoqian Jiang, PhD ©2011 MFMER | slide-1 Division of Biomedical Statistics & Informatics,
Based on “A Practical Introduction to Ontologies & OWL” © 2005, The University of Manchester A Practical Introduction to Ontologies & OWL Session 2: Defined.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Lexically Suggest, Logically Define: QA of Qualifiers &
Ontologies for Terminologies, Knowledge Representation & Software: Benefits & Gaps (“Don’t make the tea”) (Only a part of Knowledge Representation) Alan.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
1 Letting the classifier check your intuitions Existentials, Universals, & other logical variants Some, Only, Not, And, Or, etc. Lab exercise - 3b Alan.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Alan Rector, Luigi Iannone, Robert Stevens
Presentation transcript:

Quality Assurance of the Content of a Large DL-based Terminology using Mixed Lexical and Semantic Criteria: Experience with SNOMED CT Alan Rector, Luigi Iannone, Robert Stevens

2 “A report from the trenches” ►SNOMED-CT - mandated terminology for electronic patient records in UK, US, & worldwide aspirations ►The result of a merger of two other systems SNOMED and Clinical Terms v3 Long history with much opportunity for error ►Expressed in a Description Logic and now available in OWL subset of EL++ without disjoint axioms ►Has been resistant to independent analysis although many known problems Despite several global QA attempts based on lexical criteria that have identified errors without explaining them

3 It’s very big - and classification matters ►~400,000 Concepts/Classes; >1,000,000 axioms ►Much of richness only evident in classified form ►Most errors only present in classified form statedClassified

4 …and some classification horrendously complicated (Skin of Ankle)

5 An experiment of opportunity ►The opportunities ►Tried to use SNOMED for Commercial Collaboration on Clinical Systems ►Tried to use SNOMED as contribution to WHO’s revsion of International Classification of Diseases (ICD-11) ►Problems with both ►Therefore, experiment if QA & repair were possible Conventional wisdom said that it was not ►However, we had new resources ►Core Problem List Subset from NLM (8500 most used classes) ►Software to extract “modules” ►SNOROCKET Classifier for EL++ ►4-8GB machines

6 Step 1: Cut it down & find a classifier ►Find a subset ►UMLS Core Problem List subset - ►8500 most used disease concepts Collected by US National Library of Medicine by combining sets from 6 major institutions. ►Extract a “Module” (built into OWL API v3) ►Use core subset as “signature” ►Guaranteed that all inferences amongst the classes in “signature” in whole will hold in module ►35,000 concepts - including most of anatomy ►Find a classifier that can cope - at least two for checking ►SNOROCKET (EL++) polynomial time subset of OWL (30 sec) ►Pellet 2.1 (200 sec) ►FaCT++ (250 sec)

7 Step 2: Pick some areas of interest to clinicians: some with anomalies already spotted ►Myocardial Infarction (Heart attack) ►Should be a kind of Ischemic Heart Disease, but wasn’t ►Hypertension (High blood pressure) ►Odd to find it a kind of Soft Tissue disorder ►Diabetes ►Odd to find it as a Disorder of the Abdomen ►Allergies ►Odd to find some but not all autoimmune disorders classified as Allergies. ► …

8 ► Look up hierarchy (with OWLViz) ►Let clinicians find important concepts and check them Face validity and then look up the hierarchy ►Check any anomalies against the complete SNOMED in standard browser Guard against artifacts in various transformations ►Trace anomalies to their root ►Decide which links to add or break ►Decide how to break them ►Edit, classify and check Hierarchies Usages Look at classification: Most initial errors spotted looking upwards

9 OwlViz Upwards for Hypertension

10 And check for the desired result

11 Check in standard browser in full SNOMED (snob.eggbird.eu/)

12 Examine definition & formulate solution Disorder of blood vessel that (Finding site some Systemic arterial structure) and (Has definitional manifestation some Increased blood pressure) ) Disorder of blood vessel that (Finding site some Cardiovascular system structure) and (Has definitional manifestation some Increased blood pressure)

13 Then check usages for unwanted results - anything that should relate to arteries instead of Cardiovascular system?

Also look down hierarchy: Combine lexical & semantic search ►Hard to spot what is missing ►Hypertensive disorders included some complications as well as kinds of hypertension. Did it contain them all? ►Use OPPL combining lexical, owl semantics & queries ►?C:CLASS=MATCH(“.*[Hh]ypertensive.*”)  lexical SELECT ?C SubClassOf Thing  open world OWL semantics WHERE FAIL ?C SubClassOf “Hypertensive disorder”  closed world query BEGIN ADD ?C SubClassOf Candidate_hypertensive END;  action ►Classify and look at odd cases … 14

Classify and look at odd cases 15

Look for regularities ►Of hypertensive complications ►1 linked to Hypertensive disorder by property due to ►1 linked to Hypertensive disorder by property associated with ►2 are subclasses of Hypertensive disorder ►2 not linked at all ►No class for Hypertensive complication ►Although there is a class for Diabetic complication ►Regularise ►Create classes for Hypertension, Hypertensive complication and Hypertension AND/OR Hypertensive complication ►Edit all complications to schema: Disorder due to some Hypertension 16

Which concept should carry the old ID? ►Look at usages of Hypertensive disorder ►All fit Hypertension; none fit Hypertensive complication ►Therefore, label original ID for Hypertensive disorder as Hypertension New Hierarchy: ‣ Hypertension AND/OR Hypertensive complication  new ID/concept Hypertension  old ID/concept …kinds of hypertension Hypertensive complication…  new ID/concept … kinds of hypertensive complication 17

Looking down hierarchy: Analysis by categorisation ►Even short alphabetic lists are difficult to check 18 ►Break it up logically ? ?

19 Always trace errors to root to fix mish mash modelling ►Simple error ►The axiom that Skin is a kind of Soft tissue was omitted ►Therefore Injuries to skin are not listed as kinds of Soft tissue injuries ►Authors have noticed some cases and tried to compensate ►Cut of skin of foot is a kind of soft tissue injury, but Cut of the skin of lower limb was NOT a soft tissue injury ►One axiom to fix it all: Skin subClassOf SoftTissue: And then a script to find the redundant axioms

20 Trace errors to their roots: Incomplete modelling: Example ►Why is Myocardial Infarction not a kind of Ischemic Heart Disease? Ischemia = “lack of blood supply” Myocardium = “Heart muscle” ►Infarction not fully defined in SNOMED. References say… “Tissue death due to ischemia” ►Ischemic heart disease not fully defined SNOMED, Refs say… Heart disease due to ischemia ►Ischemic disorder does not exist in SNOMED, Natural closure… Disorder due to some Ischemia - NB always involves Cardiovascular system ►Add definitions and Myocardial infarction classified correctly ►Also discover a long list of Ischemic disease that have not been classified as cardiovascular ►Check lexically for other uses of “ischemic” ►None found in this subset

21 Error in schema for anatomy: Conflates branches with parts ►Example ►Injury to artery of the ankle is located in the pelvis and in the abdomen (as well as the ankle)! ►Extends to all nerves & blood vessels ►Requires a generic change ►Simplest involves about 20 axioms for arteries

22 Schema errors: conflates branches & parts Structure (or its parts) Entirety Parts Aorta or its parts Entire Aorta Structure (or its parts) Entirety Parts Abdominal Aorta or its parts Entire Abdominal Aorta Part of trunk Branch of abd aorta Artery of Foot Branch of aorta

23 Overgeneralisation – explains many arguments ►The dictionary says “Neuropathy” is a disease of nerves ►But in practice it is a “dysfunction” of nerves Doctors don’t consider tumors or injuries to nerves to be neuropathies ►SNOMED often does not distinguish structural and functional disorders Needs a consistent pattern:

24 Naming issues ►All SNOMED terms have at least two names ►“Fully qualified name” & “Preferred name” ►“Fully qualified names” should be consistent but… ►Example - conflicting names ►“Immune hypersensitivity disorder (disorder) = “Allergic disorder” ►Structure nodes in SEP triples “Structure of X”, “X Structure”, X ‣ Leads to “Swelling of gums” is kind of “Swelling of face”

25 Doing everything in a separate module (insofar as possible) Perform queries as “probes” Keep changes in Modules Compromise: System of diffs and merges

26 Summary: QA of a large DL-based ontology is possible! ►Find a useful subset and use it as signature to extract a manageable module ►Start with things that are important to your experts ►Look upwards rather than downwards in the first instance ►Follow up analogies and patterns ►When looking downwards enrich categorization to reduce noise Combine lexical and semantic techniques ►Analysis by synthesis - ►test alternative potential changes with classifier ►as far as possible in a separate module; scripting where possible ►Tooling gaps / weaknesses ►Scripting tools need work ►Combining filtering with imports ►Diffs & change management – needed but don’t enough ►Log everything!