Presentation is loading. Please wait.

Presentation is loading. Please wait.

OpenGALEN Slide No.: 1 Introduction to Clinical Terminology and Classification AL Rector OpenGALEN CO-ODE The Medical Informatics Group, U of Manchester.

Similar presentations

Presentation on theme: "OpenGALEN Slide No.: 1 Introduction to Clinical Terminology and Classification AL Rector OpenGALEN CO-ODE The Medical Informatics Group, U of Manchester."— Presentation transcript:

1 OpenGALEN Slide No.: 1 Introduction to Clinical Terminology and Classification AL Rector OpenGALEN CO-ODE The Medical Informatics Group, U of Manchester

2 OpenGALEN Slide No.: 2 Where we come from Best Practice Clinical Terminology Data Entry Clinical Record Decision Support Best Practice Data Entry Electronic Health Records Decision Support & Aggregated Data GALEN Clinical Terminology

3 OpenGALEN Slide No.: 3 Open GALEN: Philosophy Terminology is software  Terminology is the interface between people and machines Re-use is the key  Patient-centred information Terminology must have a purpose  Always ask: “What’s it for?” — Not art for art’s sake  Terminology supports clinical applications - not vice versa – Applications for someone to do something for somebody – Keep the ‘Horse before the Cart’  Always ask: “How will we know if it works?” “How will we know if it fails?”

4 OpenGALEN Slide No.: 4 Open GALEN: Key ideas Separation of kinds of knowledge  Terminology, medical record and information system schemas  Models of meaning; Models of Use  Concepts, language, Coding, Indexing, Pragmatics  Machine level, User level Knowledge is fractal!  There will always be more detail to be added — Therefore terminologies must be extensible Formal logical Support  Too big and complicated to maintain by hand — Extensibility requires rules — Software needs logical rigour

5 OpenGALEN Slide No.: 5 Axes for kinds of Knowledge Machine level Human Level Concepts Language Coding Indexing Pragmatics & User Interface Terminology Medical Records/ Information systems Decision Support rules

6 OpenGALEN Slide No.: 6 9) Interface of EHR, Messaging & Decision Support interface Concept Model (Ontology) Information Model (Patient Data Model) Inference Model (Guideline Model) Dynamic Guideline Knowledge Static Domain Knowledge Patient Specific Records Significant Research Topic Now

7 OpenGALEN Slide No.: 7 Uses of Terminology Clinical  Epidemiology and quality assurance  Reproducibility / Comparability  Indexing Software  Re-use !  Integration and Messaging between systems  Authoring and configuring systems  Data capture and presentation (user interface)  Indexing information and knowledge (meta-data, The Web)

8 OpenGALEN Slide No.: 8 An Old Problem “On those remote pages it is written that animals are divided into: a. those that belong to the Emperor b. embalmed ones c. those that are trained d. suckling pigs e. mermaids f. fabulous ones g. stray dogs h. those that are included in this classification i. those that tremble as if they were mad j. innumerable ones k. those drawn with a very fine camel's hair brush l. others m. those that have just broken a flower vase n. those that resemble flies from a distance" From The Celestial Emporium of Benevolent Knowledge, Borges

9 OpenGALEN Slide No.: 9 History: Origins of existing terminologies Epidemiology  ICD - Farr in 1860s to ICD9 in 1979 — International reporting of morbidity/mortality  ICPC s — Clinically validated epidemiology in primary care  Now expanded for use in Dutch GP software Librarianship  MeSH - NLM from around Index Medicus & Medline  EMTree - from Elsevier in 1950s - EMBase Remumeration  ICD9-CM (Clinical Modification) 1980 — 10 x larger than ICD; aimed at US insurance reimbursement

10 OpenGALEN Slide No.: 10 Traditional Systems Built by people for interpretation by people (Coding clerks)  Most knowledge implicit in rubrics — Must understand medicine to use intelligently  Not built for software On paper for use on paper  Enumerated - top down all possibilities listed — Serial - Single use - Single View Hierarchical Thesauri  Traditional terminological techniques from librarianship — ‘Broader than’ / ‘Narrower than’ (ISO 1087)  no logical foundation Focused on ‘terms’  Language and concepts mixed — Synonyms, preferred terms, etc caused confusion

11 OpenGALEN Slide No.: 11 History (2) Pathology indexing  SNOMED 1970s to 1990 (SNOMED International) — First faceted or combinatorial system  Topology, morphology, aetiology, function  Plus diseases cross referenced to ICD9 Specialty Systems  Mostly similar hierarchical systems — ACRNEMA/SDM - Radiology — NANDA, ICNP… - Nursing —…—…—…—…

12 OpenGALEN Slide No.: 12 History (3) Early computer systems  Read I (4 digit Read) — Aimed at saving space on early computers  1-5 Mbyte / 10,000 patients — Hierarchical modelled on ICD9  Detailed signs and symptoms for primary care  Purchased by UK government in 1990 — Single use  Morbidity indexing  Medical Entities Dictionary (MED) — Jim Cimino

13 OpenGALEN Slide No.: 13 History (4) Aspirations for electronic patient records (EPRs)  Weed’s Problem Oriented Medical Record — Direct entry by health care professionals Aspirations for decision support  Ted Shortliffe (MYCIN), Clem McDonald (Computer based reminders), Perry Miller (Critiquing),.. Aspirations for re-use  Patient centred information Needed common multi-use multi-purpose terminology  None worked

14 OpenGALEN Slide No.: 14 Motivations and Business Models Remuneration  ICD9/10-CM in US for insurance and medicare for diseases  Clinical Procedures Terminology (CPT) for surgical procedures Public Health Reporting  ICD9/10 Clinical Recording  Read 1-3, SNOMED-RT/CT  ICPC – International Classification of Diseases in Primary Care Indexing publications  MeSH – Medical Subject Headings - Basis of indexing MedLine/PubMed  EMTree – basis of indexing EMBASE Support for applications and decision support  GALEN

15 OpenGALEN Slide No.: 15 Summary of Changes at end of 1st Generation From terminologies for people to terminologies for machines From paper to software From single use to multiple re-use for patient centred systems From entry by coding clerks to direct entry by health care professionals From pre-defined reporting for statistics to reliable indexing for decision support

16 OpenGALEN Slide No.: 16 Changes at end of first generation From models of USE to models of MEANING  But tended to lose the model of use — The goal of “useful and usable systems” lost

17 OpenGALEN Slide No.: 17 Problems with ‘First Generation’ Enumerated Systems in coping with these changes

18 OpenGALEN Slide No.: 18 Problems (1) Scaling !!!  More detail and more specialities required scaling up, but... The combinatorial explosion  Example: Burns: — 100 sites x 3 depths  404 codes  5 subsites/site x chemical or thermal  7272 –x 3 extents x 3 durations  –x 3 extents x 3 durations  116,352  ‘The Persian chessboard’ — 2 64 — 2 64    grains of rice  100 billion tonnes of rice   nanoseconds  10,000 years  Read II grew from 20,000 to 250,000 terms in ~100 staff-years — still too small to be useful  but too big to use

19 OpenGALEN Slide No.: 19Benefits Avoid the “Exploding Bicycle” From “phrase book” to “dictionary + grammar” Tame combinatorial explosions  ICD-9 (E826) 8  READ-2 (T30..) 81  READ-3 87  ICD-10 (V10-19) 587 — V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income  and meanwhile elsewhere in ICD-10 — W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged in sports activity — X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities

20 OpenGALEN Slide No.: 20 Problems (2) Information implicit in the rubrics  “Hypertension excluding pregancy” — Computers can’t read!  Invisible to software  No explicit information except the hierarchy — Minimal support for software — No opportunity to use softwre to help Language and concepts confused  Synonyms  Preferred terms  Homonyms  Only simple look up and spelling correction

21 OpenGALEN Slide No.: 21 Problems (3) Mixed Organisation  ‘Heart diseases’ in 13 of 19 chapters of ICD — Tumours, infections, congenital abnormalities, toxic, …  ‘Steroids’ in five chapters of standard drug classifications — Anti-inflammatories, anthi-asthmatics, …  Unreliable for indexing or Abstractions — How to say something about ‘all heart diseases’? Fixed organisation  Single hierarchy - Single use — Where to put ‘gout’ - arthritis or metabolic disease?  Back and forth in each edition of ICD — No re-use

22 OpenGALEN Slide No.: 22 Problems 3b Thesauri rather than Classifications A Mixed Hierarchy A correct kind-of (subsumption) hierarchy

23 OpenGALEN Slide No.: 23 Problems (4) ‘Semantic identifiers’  Codes really paths - moving a concept meant changing its code 3 Cardiovascular disorders … 3.4 Disorders of Artery Disorders of coronary artery... … Coronary thrombosis … Easy to process but...  Reorganisation requires changing codes  Codes cannot be permanent

24 OpenGALEN Slide No.: 24 Problems (5) Maintenance  20 Years from ICD9 to ICD10  ~100 person-years from Read 1 to Read 3  Mega francs/guilders/crowns/marks on European coding schemes  Thousands of unpaid hours of committee time — Impossible / meaningless decisions take longest  You can search forever for something that is not there — Multiple uses compete -  Must choose one use –Most successful were clear about their purpose - ICD, ICPC, MeSH Codes change meaning with version changes  Old data misleading!

25 OpenGALEN Slide No.: 25 Problems (6) Version specific artefacts  “Not otherwise specified” (NOS) — Used to move a general concept ‘down’  Not elsewhere classified (NEC) — Catch all - Nowhere else in coding system e.g. ‘Tumour not elsewhere classified’  dependent on version,  “Other” — Catch all - Not listed below, e.g. “Other diseases of the cardiovascular system”  dependent on version Not used consistsently

26 OpenGALEN Slide No.: 26 Problem (7): Language is slippery: Two hands or Four?

27 OpenGALEN Slide No.: 27 Language/Concepts are slippery Human cognition makes it look easy  Logic fails to capture it — Classification is easy until you try to do it  Trying since Aristotle in the West and Ancient Chinese in the East Words/Concepts mean what a community decides they mean  Does a chimpanzee have four hands?  Is a prion alive?  Is surgery on the ovary a kind of ‘Endocrine surgery’? Easier to agree on the concrete than the abstract  Easy to agree on useful abstractions and generalisations — Harder to agree on how to name them

28 OpenGALEN Slide No.: 28 Problems (8) There is no re-use - there is no standard  The ‘grand challenge’: A common controlled vocabulary for medicine — But re-use requires multiple different views  People’s needs differ / People do and find different things — By profession  Doctors and specialties, nurses, physiotherapiests, dentists… — By situation  Inpatient, outpatient, primary care, community… — By task  Diagnosis, management, prescribing,  patient care, public health, quality assurance, management, planning — By country and community  US, UK, France, Germany, Japan, Korea,...

29 OpenGALEN Slide No.: 29 Summary of Problems 1st Generation Enumerated Systems Enumerated Single Hierarchies  List all possibilities in advance — Cannot cope with fractal knowledge  Most knowledge implicit — Invisible to software  Can’t agree on common concepts and classification — Unreliable for indexing Difficult to use for healthcare professionals  No support for user interface Can’t build and maintain big classifications  Language and concepts don’t translate easily to logic and software

30 OpenGALEN Slide No.: 30 Cimino’s Desiderata (1) Concept orientation  Separate language (terms) and concepts (codes) Concept permanence  Never re-use a code (‘retire’ it) Nonsemantic concept identifiers  Separate the code from the path Polyhierarchy  Allow one concept to be classified in multiple ways — Gout can be both a metabolic disease and an arthritis

31 OpenGALEN Slide No.: 31 Cimino’s Desiderata (2) Formal Definitions  i.e ‘Be compositional’ Reject ‘Not elsewhere classified’  concept permanence and NEC Multiple granularities  Organ, tissue, cellular, molecular  Grades, types, classes of diseases  Special clinical criteria Multiple consistsent views  Allow different organisations — e.g. functional, anatomical, pathological

32 OpenGALEN Slide No.: 32 Cimino’s Desiderata (3) Represent context  Family history, risk, source of information Evolve gracefully  Allow controlled changes Recognise redundancy (equivalence)  ‘Carcinoma’ + ‘Lung’ ?=? ‘Carcinoma of the lung’ — How would we know?  How could a machine know?

33 OpenGALEN Slide No.: 33 Solution 0: You are worrying about the wrong problem International Classification of Primary Care (ICPC)  Focus on repeatability and quality across languages for a small (<2000) number of codes

34 OpenGALEN Solution Generation 1 Megaterm + Crossmapping = UMLS Clinical Applications Medical Records Data entry Decision support UMLS Medical Records Data entry Decision support MEGA- TERM Coding & Classification ICD-9 ICD-10 MeSH ACRNEMA ICPC SNOMED Axes READ OPCS

35 OpenGALEN Slide No.: 35 Cross mapped and typed terminologies & vocabularies

36 OpenGALEN Slide No.: 36 The UMLS Knowledge Sources Metathesaurus  Cross mappings Language resources  NORM – stemming and term recognition UMLS Semantic Net  170 types attached to categorise concepts — Disease, anatomical part, micro-organism, etc.

37 OpenGALEN Slide No.: 37

38 OpenGALEN Slide No.: 38 Unified Medical Language System (UMLS) from US National Library of Medicine  Defacto common registry for vocabularies — Concept Unique Identifiers (CUIs) and Lexical Unique Identifiers (LUIs) are defacto the common nomenclature  NB must use a CUI + LUI to get unique identification  Licence terms — Class I – free for use — Clsass III – heavily restricted — (Class II – almost nonexistent) Solution 1 Cross-mapping & UMLS

39 OpenGALEN Slide No.: 39 Solution 1 Cross-mapping & UMLS An invaluable resource, but...  No better than the vocabularies which are mapped — Limited detail for patient care — Unreliable for indexing or abstraction of knowledge — Best for relating everything to MeSH for indexing literature  Still limited by combinatorial explosion — Still can’t cope with fractal knowledge  Not extensible - no help in building or extending terminologiese  No help in reorganising existing terminologies to re-use for new purposes  Top down  Information still implicit — Minimal help with software  No help with data capture, user interfaces

40 OpenGALEN Slide No.: 40 Solution IIa: Build what you need as you need it LOINC – dominant coding system for laboratory systems (“Logical Observation Identifiers Names and Codes”) Clinical LOINC contains increasing amounts of clinical references  Fully Class I included in UMLS Closely linked to HL7 and HL7 vocabulary committee

41 OpenGALEN Slide No.: 41

42 OpenGALEN Slide No.: 42 Build and Control what you need only HL7 Messaging standard  Controls the codes that hold messages together  Uses codes from elsewhere as ‘payload’  See — (Possib ly the world’s worst web site) — Some material members only

43 OpenGALEN Slide No.: 43 Solutions Generations 2-3 Compositional Systems Beat the combinatorial explosion  Build concepts out of pieces - leggo — Dictionary and grammar rather than phrasebook  But hard

44 OpenGALEN Slide No.: 44 Solution Generation 1.5: Faceted Faceted systems: SNOMED International — Inflammation + Lung + Infection + Pneumococcus  Pneumoccal pneumonia Limit combinatorial explosion, but…  Rigid - a limited number of axes / facets / chapters  Each facet has the problems of a first generation enumerated system — Much knowledge still implicit  No way to know how identifiers relate — No explicit relations, only ‘+’ — No way to recognise redundancy / equivalence — No help with data capture or user interface / No way to recognise nonsense  Carcinoma + Hair + Donkey + Emotional  ????  Still can’t cope with fractal knowledge — Limited extensibility: limited help with building, extending or reorganising  Still Top Down

45 OpenGALEN Slide No.: 45 Generation 2: Enumerated Compositional Read III with qualifiers — Inflammation: site: lung, cause: pneumococcus  Pnemococcal Pneumonia More semantics but…  Limited qualifiers - limited views - limited re-use  Limited help with data capture - User interface difficult  Much information still implicit - limited software support — No way to recognise redundancy / equivalence / errors — Organisation still mixed - indexing better but still unreliable  Limited separation of language and concepts  Still can’t cope with fractal knowledge — Limited extensibility; limited help with building and reorganising terminologies  Top down

46 OpenGALEN Slide No.: 46 Encrustation + involves: MitralValve Thing + feature: pathological Structure + feature: pathological + involves: Heart Logic Based Ontologies: The basics Thing Structure HeartMitralValveEncrustation MitralValve * ALWAYS partOf: Heart Encrustation * ALWAYS feature: pathological Feature pathological red + (feature: pathological) red + partOf: Heart red + partOf: Heart Primitive skeletonDescriptionsDefinitionsReasoningValidating

47 OpenGALEN Slide No.: 47 CT Vocabulary “Reference Terminology” vs “Interface Terminologies”  Reference terminology = enumerated hierarchy of formally defined terms  Interface terminology = navigation structure for user interface — Explicitly excluded from SNOMED-RT “Terming”, “Coding”, and “Grouping”  Terming - finding the lexical string  Coding - finding the correct unique code (concept)  Grouping - putting codes into groupers for epidmiological or other purposes

48 OpenGALEN Slide No.: 48 Generation 2.5 Pre-coordinated Formal Compositions SNOMED-CT  Formal collaboration between College of American Pathologists (CAP/SNOMED) and NHS — Formal logical model for classifying a fixed list of definitions — Simple fixed ontology (7 links)  Now officially adopted and probably available for both NHS and related academic uses GALEN derived terminologies  UK Drug Ontology  Procedure classifications

49 OpenGALEN Slide No.: 49 Generation III Fully compositional post coordinated  Not yet in use or fully available — GALEN-like  Will probably arrive with Semantic Web

50 OpenGALEN Slide No.: 50 Other Key Resources Anatomy  Digital Anatomist Foundational Model of Anatomy — University of Washington (  Comprehensive model of STRUCTURAL anatomy  Transformed into formal representation in Freiburg –Feasibility rather than production  Mouse — The Edinburgh Mouse Atlas Project ( Bioinformatics  GO - The Gene Ontology  MGED – Mircroarray Gene Expression Data  OMIM – Online Mendelian Inheritance in Man Drugs  Proprietary databases – First Databank, Micromed  UK Drug Dictionary (UKCPRS) National Cancer Institute CaCore Ontologies

51 OpenGALEN Slide No.: 51 Current Status (1) UMLS is the central coordinating force  Any terminology needs links links to CUIs and LUIs — Many people using CLASS I licensed terms only  Links to MeSH and PubMed ICD9/10-CM used for reporting of diseases for insurance and Medicare in the US  ICD-10 used for official reporting in UK CPT and OPCS used for reporting of procedures in US and UK respectively SNOMED-CT purchased by US and mandated in UK  As yet few convincing

52 OpenGALEN Slide No.: 52 Current Status (2) ICPC widely used in in primary care on continent, especially in the Netherlands LOINC used for lab systems; HL7 for messaging Variants of SNOMED used for pathology many places Many specialist systems  SNOMED-DICOM-Microglossary (SDM) for imaging — Unrelated to SNOMED  Several nursing systems A variety of open source resources appearing

53 OpenGALEN Slide No.: 53 Current Status (3) Commercial world dominated by proprietary systems  MedCin  All based on “Model of Use”

54 OpenGALEN Slide No.: 54 The Semantic Web and OWL “Ontologies” – fancy word for terminologies  Means many things to many people W3C has produced a standard language for compositional “logic based” ontologies, OWL  “OIL” + “DAML”  “DAML+OIL”  “OWL” — See — See — See Rapid proliferation of open source tools and resources  No longer a biomedical problem only  Serious computer scientists finally involved

Download ppt "OpenGALEN Slide No.: 1 Introduction to Clinical Terminology and Classification AL Rector OpenGALEN CO-ODE The Medical Informatics Group, U of Manchester."

Similar presentations

Ads by Google