Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Ontologies, Clinical and Genomic Information How to say what we mean and mean what we say Opportunities & Pitfalls Alan Rector, Jeremy Rogers, Chris.

Similar presentations


Presentation on theme: "1 Ontologies, Clinical and Genomic Information How to say what we mean and mean what we say Opportunities & Pitfalls Alan Rector, Jeremy Rogers, Chris."— Presentation transcript:

1 1 Ontologies, Clinical and Genomic Information How to say what we mean and mean what we say Opportunities & Pitfalls Alan Rector, Jeremy Rogers, Chris Wroe Information Management Group / Bio Health Informatics Group Department of Computer Science, University of Manchester rector@cs.man.ac.uk www.clinical-escience.org www.co-ode.org www.opengalen.org protege.stanford.org

2 2 What Is An “Ontology”? Ontology (Socrates & Aristotle 400-360 BC) –The study of being Word borrowed by computing for the explicit description of the conceptualisation of a domain: –concepts (“entities”) –properties and attributes of concepts –constraints on properties and attributes –Individuals (often, but not always) An ontology defines –a common vocabulary –a shared understanding –a classification

3 3 Sharing info  Sharing meaning Metadata Data describing the content and meaning of resources and services. But everyone must speak the same language… Terminologies Shared and common vocabularies For search engines, agents, curators, authors and users But everyone must mean the same thing… Service provider Service provider Service provider Service provider Service provider Service provider Service provider Service provider Service provider Service provider Ontologies Shared and common understanding of a domain Essential for search, exchange and discovery

4 4 Measure the world…quantitative models (not ontologies) Quantitative –Numerical data: 2mm, 2.4V, between 4 and 5 feet –Unambiguous tokens –Main problem is accuracy at initial capture –Numerical analysis (e.g. statistics) well understood Examples: –How big is this breast lump? –What is the average age of patients with cancer ? –How much time elapsed between original referral and first appointment at the hospital ?

5 5 describe the the world – ontologies Qualitative –Descriptive data Cold, colder, blueish, not pink, drunk –Ambiguous tokens What’s wrong with being drunk ? –Ask a glass of water. –Accuracy poorly defined More examples –How pleomorphic are the cells in the biopsy? –What is a protein’s function? –What is the derivation of a tissue?

6 6 Why Develop an Ontology? Naming, Classifying, Indexing To share common understanding of the structure of descriptive information –among people –among software agents –between people and software To enable reuse of domain knowledge –to introduce standards to allow interoperability To index and annotate other resources Semantic Interoperability Foundation of the Semantic Web/Grid

7 7 More Reasons To make domain assumptions explicit –easier to change domain assumptions (consider a genetics knowledge base) –easier to understand and update legacy data To separate domain knowledge from the operational knowledge –re-use domain and operational knowledge separately (e.g., configuration based on constraints) To manage the combinatorial explosion

8 8 A semantic continuum [Mike Uschold, Boeing Corp] Shared human consensus Implicit Text descriptions Pump: “a device for moving a gas or liquid from one place or container to another” Informal (explicit) Semantics hardwired; used at runtime Formal (for humans) Semantics processed and used at runtime (pump has (superclasses (…)) Formal (for machines) Less ambiguity Better inter-operation More robust – less hardwiring More difficult  Further to the right 

9 9 An Ontology should be just the Beginning Ontologies Software agents Problem- solving methods Domain- independent applications Domain- independent applications Databases Declare structure Knowledge bases Knowledge bases Provide domain description The “Semantic Web”

10 10 What an Ontology Isn’t (“It won’t make the coffee”) A database –Ontologies are about categories/classes/types/concepts/entities not instances ABOUT diseases, genes, proteins,... NOT ABOUT specific patients, samples, studies, … A database/EHR schema –An ontology is about meaning rather than storage Although ontology technologies are a means for merging schemas A decision support/protocol management system –The entities used in the rules, not the rules A metadata schema –The entities used in the metadata, not the schema itself A lexicon –Meaning rather than language But every ontology needs language tools

11 11 Ontology Technologies Description logics (DLs), OWL –Designed to provide logical support for automatic classification and consistency checking Designed for sharing and software engineering Leverage off Semantic Web / Grid commnity –But not everything in OWL is an ontology RDF(S) Specialised for groups –DAGEdit and other OBO tools; FMA explorer, … UML –Carefully developed UML models convey much information for an ontology But support only very simple inference and checking

12 12 Why it’s hard (1) Language is slippery & local; Rigour & logic are hard –Classification is too easy for people (to do badly) But logical/computational properties unintuitive –Combinatorial explosions –Philosophical & “religious” differences Information capture –Data quality –Tools & environments Different points of view –Oncology, Cardiology, … –Adult, developmental, aetiological,… –Clinical, genetic, genomic,

13 13 Why it’s hard (2) Need a combined model of meaning –The EHR/Database holding the ontology PLUS The ontology held Hard to scope – easy to do too much –“Just in time” ontology Better in the bio than the medical community Software engineering methods poorly understood

14 14 Classification is easy for people (to do badly) “ On those remote pages it is written that animals are divided into: a. those that belong to the Emperor b. embalmed ones c. those that are trained d. suckling pigs e. mermaids f. fabulous ones g. stray dogs h. those that are included in this classification i. those that tremble as if they were mad j. innumerable ones k. those drawn with a very fine camel's hair brush l. others m. those that have just broken a flower vase n. those that resemble flies from a distance" From The Celestial Emporium of Benevolent Knowledge, Borges

15 15 Avoiding combinatorial explosions The “Exploding Bicycle” From “phrase book” to “dictionary + grammar” –1980 - ICD-9 (E826) 8 –1990 - READ-2 (T30..) 81 –1995 - READ-3 87 –1996 - ICD-10 (V10-19 Australian) 587 V31.22 Injury or accident to the occupant of three-wheeled motor vehicle in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income –and meanwhile elsewhere in ICD-10 W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged in sports activity X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities

16 16 The ontology nested in the EHR the ehr (hl7 rim) [moodCode=“Event” subject=“Relative” code={ } ] diabetes (subject person_in_family) the ontology (snomed-ct)  the combined meaning What is legal? Required? Mandatory?…

17 17 Developing Software Engineering Methodologies for Ontologies: Building a life cycle –Use/test cases & exemplars –Identifying problems – alternative solutions - exploring consequences – deciding amongst alternatives –Specifying solutions Human and machine readable form –Setting conformance tests for specifications Building reference implementations –Monitoring for problems Recording of problems and changes

18 18 Logic-based Ontologies: Conceptual Lego hand extremity body acute chronic abnormal normal ischaemic deletion bacterial polymorphism cell protein gene infection inflammation Lung expression

19 19 Logic-based Ontologies: Conceptual Lego “ SNPolymorphism of CFTRGene causing Defect in MembraneTransport of Chloride Ion causing Increase in Viscosity of Mucus in CysticFibrosis …” “Hand which is anatomically normal”

20 20 Logical Constructs build complex concepts from modularised primitives Genes Species Protein Function Disease Protein coded by gene in humans Function of Protein coded by gene in humans Disease caused by abnormality in Function of Protein coded by gene in humans Gene in humans

21 21 Normalising (untangling) Ontologies Structure Function Part-whole Structure Function Part-whole

22 22 A simplified example: Build a simple treee easy to maintain

23 23 Let the classifier organise it

24 24 If you want more abstractions, just add new definitions (re-use existing data) “Diseases linked to abnormal proteins”

25 25 And let the classifier work again

26 26 And again – even for a quite different category “Diseases linked genes described in the mouse”

27 27 Untangling and Enrichment Using a classifier to make life easier Substance - Protein - - ProteinHormone - - - Insulin - Steroid - - SteroidHormone - - - Cortisol - Hormone - -ProteinHormone - - - Insulin - - SteroidHormone - - - Cortisol - Catalyst - - Enzyme - - - ATPase - PhsioloicRole - - HormoneRole - - CatalystRole - Substance - - Protein - - - Insulin - - - ATPase - Steroid - - Cortisol Hormone  Substance & playsRole - someValuesFrom HormoneRole ProteinHormone  Protein & playsRole someValuesFrom HormoneRole SteroidHomone  Steroid & playsRole someValuesFrom HormoneRole Catalyst  Substance & playsRole someValuesFrom CatalystRole Enzyme  Protein & playsRole someValuesFrom CatalystRole Insulin  playsRole someValuesFrom HormoneRole Cortisol  playsRole someValuesFrom HormoneRole ATPase  playsRole someValuesFrom CatalystRole Substance - Protein - - ProteinHormone - - - Insulin - - Enzyme - - - ATPase - Steroid - - SteroidHomone^ - - - Cortisol -Hormone - - ProteinHormone^ - - - Insulin^ - - SteroidHormone^ - - - Cortisol^ - Catalyst - - Enzyme^ - - - ATPase^

28 28 Ontologies and Reference Information Resources An ontology is just one part –Naming - Definitions & necessary conditions –Classification –Indexing Knowledge bases –What we know about those entities – what is true in general Databases –What we know about individuals –Instance stores – specialised databases that link to ontologies Plus –Lexicons –Metadata –Mappings

29 29 Data store Data on Individuals Prototypical Knowledge Knowledge Base Definitional knowledge Ontology Meta Data Annotation Linguistic Knowledge

30 30 Example 1: Indexing Drug Contraindications (or guidelines or information or…) use of beta blocker in asthma beta blocker asthma serious contraindication mild contraindication cardioselective cardioselective beta blocker use of cardioselective beta blocker in asthma

31 31 Idiopathic Hypertension in our co’s Phase 2 study Example 2: Indexing data entry forms Fractal tailoring forms for clinical trials Hypertension Idiopathic Hypertension In our company’s studies In Phase 2 studies Hypertension Idiopathic Hypertension` In our company’s studies In Phase 2 studies

32 32 Example 3: PEN&PAD Fractal Tailoring of ‘fail soft’ forms What is it sensible to say about …?

33 33

34 34 Technical Barriers to linking ontologies Overlap –Linking independent ontologies easy Overlap ALWAYS brings differences in meaning –To integrate, separate Appropriate levels of abstraction –Genetics/Genomics is changing disease clqssification “Anti-angina drugs” “Ingredients conjugated in the liver” Feedback –New biology  new clinical classifications  … Disciplin required to keep separations Views –Anatomy – Tissues (developmental) vs Structures vs Functions

35 35 Nontechnical barriers to linking ontologies Organisational barriers –How to keep separation and scope of individual ontologies All enterprises tend to expand and encroach Discipline barriers –Task barriers Fit for one purpose is not fit for all purposes Language barriers –Between communities as well as languages IP barriers Process –Collaborative distributed vs Centralised –Authority –Life cycle and rate of change GO runs at web speed – seconds - days SNOMED runs at e-publishing speed – 6mo-3 years ICD runs at print/committee speed – 10-20 years

36 36 “ Good ontologies ” Fitness for purpose –What’s it for? –Defined scope “Ownership” by users –A language belongs to its community Human factors –Understandability, Reliability! Evaluation criteria –How do we know if it meets its purpose?Evolution “ Process not Product! ”

37 37 “Good ontologies” Internal Structure –Consistency –Modularity & Normalisation Software engineering issues~ Architecture & Tools –It’s software! It evolves! It’s a standard! Conformance and regression testing matter Philosophical clarity –Class-instance divide correct “Instances” are different in ontologies and databases Ontologies are about a view of the world Not about how to store information in a database –Clear distinction between part-whole and kind-of

38 38 Grounding cost vs Cleanup cost What do we need to share? –What is broken? How much do we need to know to communicate? –Easy to build too much And very costly! –“Just in time ontology” Use logic Use the web Bio / OBO does well Medicine so far doing badly

39 39 Important Ontologies & related standards OBO (Open Biomedical Ontologies) –Gene Ontology –MGED family –… UMLS –Massive resource for cross referencing –Use CUIs & LUIs – “Concept Unique IDs” “Lexical Unique IDs” SNOMED-CT –SNOMED-International Anatomy –Digital Anatomist FMA, Mouse Developmental, Mouse Adult –SAEL – Standard Anatomy Entry List NCICB –CaCORE ontology National minimum data sets – controlled vocabularies HL7, LOINC, DICOM, CDISC, … OpenGALEN – source for experimentation and development Bio databases – at least implicit controlled vocabularies –Swissprot, OMIM,, ENSEMBLE, PRINTs, … … …

40 40 Summary: Planning for Naming, Classifying, Indexing What is it for? Is there a gap? What is needed? –What are the use cases? Criteria for success? –Does it exist already? –Is an ontology the answer? Is an ontology needed for the answer? –What else is needed? A reference knowledge source? –What is the MINIMUM that one can do? Who will own it? –Can we build it collaboratively? –What is the authority? How will it evolve? –What is the pace of change? Can we do it “just in time”? –Can we evaluate and test it – again and again?


Download ppt "1 Ontologies, Clinical and Genomic Information How to say what we mean and mean what we say Opportunities & Pitfalls Alan Rector, Jeremy Rogers, Chris."

Similar presentations


Ads by Google