Presentation is loading. Please wait.

Presentation is loading. Please wait.

Developing Medical Informatics Ontologies with Protégé

Similar presentations


Presentation on theme: "Developing Medical Informatics Ontologies with Protégé"— Presentation transcript:

1 Developing Medical Informatics Ontologies with Protégé
Natasha F. Noy Samson W. Tu Stanford University {noy,

2 Tutorial materials The Protégé application: copy the Protégé-2000 directory into your “Program Files” or “Applications” folder The tutorial example: copy the “Wine” folder on your hard disk Examples of Medical Informatics ontologies copy the “Medical Informatics” examples on your disk Slides from the tutorial (AMIA2003-Protege- Tutorial.ppt)

3 A shared ONTOLOGY of wine and food
Which wine should I serve with seafood today? French wines and wine regions A shared ONTOLOGY of wine and food California wines and wine regions

4 Outline Ontology development basics
What is an ontology and why do we need one? A step-by-step guide to ontology development An overview of Protégé Advanced issues in knowledge modeling Medical Informatics ontologies: examples and design decisions Additional resources: Protégé plugins and applications

5 What Is An Ontology An ontology is an explicit description of a domain: concepts properties and attributes of concepts constraints on properties and attributes Individuals (often, but not always) An ontology defines a common vocabulary a shared understanding

6 Ontology Examples Taxonomies on the Web Catalogs for on-line shopping
Yahoo! categories Catalogs for on-line shopping Amazon.com product catalog Domain-specific standard terminology SNOMED Clinical Terms – terminology for clinical medicine UNSPSC - terminology for products and services

7 What Is “Ontology Engineering”?
Ontology Engineering: Defining terms in the domain and relations among them Defining concepts in the domain (classes) Arranging the concepts in a hierarchy (subclass- superclass hierarchy) Defining which attributes and properties (slots) classes can have and constraints on their values Defining individuals and filling in slot values

8 Why Develop an Ontology?
To share common understanding of the structure of information among people among software agents To enable reuse of domain knowledge to avoid “re-inventing the wheel” to introduce standards to allow interoperability

9 More Reasons To make domain assumptions explicit
easier to change domain assumptions (consider a genetics knowledge base) easier to understand and update legacy data To separate domain knowledge from the operational knowledge re-use domain and operational knowledge separately (e.g., configuration based on constraints)

10 An Ontology Is Often Just the Beginning
Databases Declare structure Ontologies Knowledge bases Provide domain description Domain-independent applications Software agents Problem-solving methods

11 Wines and Wineries

12 Wines Versus Drugs Wine Dish Drug Disease
Which wine characteristics should I consider when choosing a wine? Which drug characteristics should I consider when prescribing a drug? Is Bordeaux a red or white wine? Is lisinopril an ACE Inhibitor or a Beta Blocker?

13 Wines Versus Drugs Wine Dish Drug Disease
Does Cabernet Sauvignon go well with seafood? Is lisinopril indicated for hypertension? Which characteristics of a wine affect its appropriateness for a dish? Which characteristics of a drug affect its appropriateness for a disease

14 Ontology-Development Process
In this tutorial: determine scope consider reuse enumerate terms define classes properties constraints create instances In reality - an iterative process: determine scope consider reuse enumerate terms define classes define properties create instances classes constraints consider reuse define properties constraints create instances

15 Ontology Engineering versus Object-Oriented Modeling
An ontology reflects the structure of the world is often about structure of concepts actual physical representation is not an issue An OO class structure  reflects the structure of the data and code  is usually about behavior (methods)  describes the physical representation of data (long int, char, etc.)

16 Preliminaries - Tools Protégé-2000 Some other available tools:
is a graphical ontology-development tool supports a rich knowledge model is open-source and freely available Some other available tools: Ontolingua and Chimaera OntoEdit OilEd

17 Determine Domain and Scope
consider reuse enumerate terms define classes define properties define constraints create instances What is the domain that the ontology will cover? For what we are going to use the ontology? For what types of questions the information in the ontology should provide answers (competency questions)? Answers to these questions may change during the lifecycle

18 Competency Questions Which wine characteristics should I consider when choosing a wine? Is Bordeaux a red or white wine? Does Cabernet Sauvignon go well with seafood? What is the best choice of wine for grilled meat? Which characteristics of a wine affect its appropriateness for a dish? Does a flavor or body of a specific wine change with vintage year? What were good vintages for Napa Zinfandel?

19 Consider Reuse Why reuse other ontologies? to save the effort
determine scope consider reuse enumerate terms define classes define properties define constraints create instances Why reuse other ontologies? to save the effort to interact with the tools that use other ontologies to use ontologies that have been validated through use in applications

20 What to Reuse? Ontology libraries Upper ontologies
Protégé ontology library (protege.stanford.edu/ontologies.html) DAML ontology library ( Ontolingua ontology library ( Upper ontologies IEEE Standard Upper Ontology (suo.ieee.org) Cyc (

21 What to Reuse? (II) General ontologies Domain-specific ontologies
DMOZ ( WordNet ( Domain-specific ontologies UMLS Semantic Net GO (Gene Ontology) ( GLIF HL7

22 Enumerate Important Terms
determine scope consider reuse define classes define properties define constraints create instances What are the terms we need to talk about? What are the properties of these terms? What do we want to say about the terms?

23 Enumerating Terms - The Wine Ontology
wine, grape, winery, location, wine color, wine body, wine flavor, sugar content white wine, red wine, Bordeaux wine food, seafood, fish, meat, vegetables, cheese

24 Define Classes and the Class Hierarchy
determine scope consider reuse enumerate terms define properties define constraints create instances A class is a concept in the domain a class of wines a class of wineries a class of red wines A class is a collection of elements with similar properties Instances of classes a glass of California wine you’ll have for lunch

25 Class Inheritance Classes usually constitute a taxonomic hierarchy (a subclass-superclass hierarchy) A class hierarchy is usually an IS-A hierarchy: an instance of a subclass is an instance of a superclass If you think of a class as a set of elements, a subclass is a subset

26 Class Inheritance - Example
Apple is a subclass of Fruit Every apple is a fruit Red wine is a subclass of Wine Every red wine is a wine Chianti wine is a subclass of Red wine Every Chianti wine is a red wine

27 Levels in the Hierarchy
Top level Middle level Bottom level

28 Modes of Development top-down – define the most general concepts first and then specialize them bottom-up – define the most specific concepts and then organize them in more general classes combination – define the more salient concepts first and then generalize and specialize them

29 Documentation Classes (and slots) usually have documentation
Describing the class in natural language Listing domain assumptions relevant to the class definition Listing synonyms Documenting classes and slots is as important as documenting computer code!

30 Define Properties of Classes – Slots
determine scope consider reuse enumerate terms define classes define properties define constraints create instances Slots in a class definition describe attributes of instances of the class and relations to other instances Each wine will have color, sugar content, producer, etc.

31 Properties (Slots) Types of properties Simple and complex properties
“intrinsic” properties: flavor and color of wine “extrinsic” properties: name and price of wine parts: ingredients in a dish relations to other objects: producer of wine (winery) Simple and complex properties simple properties (attributes): contain primitive values (strings, numbers) complex properties: contain (or point to) other objects (e.g., a winery instance)

32 Slots for the Class Wine

33 Slot and Class Inheritance
A subclass inherits all the slots from the superclass If a wine has a name and flavor, a red wine also has a name and flavor If a class has multiple superclasses, it inherits slots from all of them Port is both a dessert wine and a red wine. It inherits “sugar content: high” from the former and “color:red” from the latter

34 Property Constraints determine scope consider reuse enumerate terms define classes define constraints define properties create instances Property constraints (facets) describe or limit the set of possible values for a slot The name of a wine is a string The wine producer is an instance of Winery A winery has exactly one location

35 Facets for Slots at the Wine Class

36 Common Facets Slot cardinality – the number of values a slot has
Slot value type – the type of values a slot has Minimum and maximum value – a range of values for a numeric slot Default value – the value a slot has unless explicitly specified otherwise

37 Common Facets: Slot Cardinality
Cardinality N means that the slot must have N values Minimum cardinality Minimum cardinality 1 means that the slot must have a value (required) Minimum cardinality 0 means that the slot value is optional Maximum cardinality Maximum cardinality 1 means that the slot can have at most one value (single-valued slot) Maximum cardinality greater than 1 means that the slot can have more than one value (multiple-valued slot)

38 Common Facets: Value Type
String: a string of characters (“Château Lafite”) Number: an integer or a float (15, 4.5) Boolean: a true/false flag Enumerated type: a list of allowed values (high, medium, low) Complex type: an instance of another class Specify the class to which the instances belong The Wine class is the value type for the slot “produces” at the Winery class

39 Facets and Class Inheritance
A subclass inherits all the slots from the superclass A subclass can override the facets to “narrow” the list of allowed values Make the cardinality range smaller Replace a class in the range with a subclass Wine producer Winery is-a is-a French wine producer French winery

40 Create Instances Create an instance of a class
determine scope consider reuse enumerate terms define classes define constraints create instances define properties Create an instance of a class The class becomes a direct type of the instance Any superclass of the direct type is a type of the instance Assign slot values for the instance frame Slot values should conform to the facet constraints Knowledge-acquisition tools often check that

41 Creating an Instance: Example

42 Outline Ontology development basics
What is an ontology and why do we need one? A step-by-step guide to ontology development An overview of Protégé Advanced issues in knowledge modeling Medical Informatics ontologies: examples and design decisions Additional resources: Protégé plugins and applications

43 Historical background: early days
ONCOCIN (1980s) Clinical decision-support system (CDSS) for management of patients enrolled in cancer clinical trials OPAL (~1985) A graphical user interface to encode cancer clinical trials for ONCOCIN based on a model of cancer trials Protégé (Mark Musen dissertation) A system to define model of trials for any domain, to generate OPAL for eONCOCIN (CDSS for any trial domain)

44 Historical background: 1990s
Protégé-II (early 1990s) A knowledge engineering environment (on NeXTStep platform) to define model and generate GUI editor for any domain ProtegeWin (mid 1990s) Windows version that emphasized usability External user groups

45 Historical background: late 1990s – present
Protégé-2000 (late 1990s – 2003) Java-based version that emphasized formal knowledge model, interoperability with other formalisms (e.g. Ontolingua, RDF) Development of extensible plugin architecture Open source Protégé, v2.0 (to be released in 2003) Multi-user development Built-in support for XML Semantic Web support

46 Protégé-2000 An extensible and customizable toolset for constructing knowledge bases (KBs) and for developing applications that use these KBs Outstanding features Automatic generation of graphical-user interfaces, based on user-defined models, for acquiring domain instances Extensible knowledge model and architecture Scalability to very large knowledge bases

47 Protégé system development methodology
Protégé-2000 support In this tutorial: determine scope consider reuse enumerate terms define classes properties constraints create instances In reality - an iterative process: determine scope consider reuse enumerate terms define classes define properties create instances classes constraints consider reuse define properties constraints create instances

48 Default interface Tabs partition different work areas
Buttons and widgets for manipulating slots Area for manipulating the class hierarchy

49 GUI Components (Demo) Tabs partition different work areas
Classes tab for defining and editing classes Forms tab for custom-tailoring GUI forms for defining and editing instances Instances tab for defining and editing instances Classes & Instances tab for working with both classes and instances Widgets for creating, editing, and viewing values of a slot (or a group of slots) Text-field or text-area widget for a slot with string value type Diagram widget for set of slots defining a graph Slot widgets check facet constraint violations (red rectangles) Buttons and menus for performing operations

50 Classes, slots, facets and instance are all frames

51 Protégé-2000 basic types Any Boolean Class Instance Float String
Integer Symbol (enumerated constants)

52 Multiple Inheritance A class can have multiple superclasses

53 Slots in Protégé Slots are first-class objects in Protégé
Slots are defined at the top level There can be only one slot (e.g., name) in the knowledge base. It can be attached to several classes Person name Newspaper

54 Facets: property constraints
Facets describe or limit the set of possible values for a slot Color can be either red, white, or rosé The value of the winery slot is an instance of the winery class There can be more than one grape from which the wine is made

55 Common Facets Slot cardinality – the number of values a slot has
Slot value type – the type of values a slot has Minimum and maximum value – a range of values for a numeric slot Default value – the initial value for a slot when the instance is created

56 Instances tab

57 Creating instances of classes
Create an instance of selected class Copy selected instance

58 Wrong and missing slot values

59 Forms tab Change browser key Change slot widgets Change layout

60 Where to go for help Protégé user’s guide FAQ
ml elopment/ontology101.html FAQ

61 Outline Ontology development basics
What is an ontology and why do we need one? A step-by-step guide to ontology development An overview of Protégé Advanced issues in knowledge modeling Medical Informatics ontologies: examples and design decisions Additional resources: Protégé plugins and applications

62 Going Deeper Breadth-first coverage  Depth-first coverage determine
scope consider reuse enumerate terms define classes properties constraints create instances  Depth-first coverage determine scope consider reuse enumerate terms create instances classes define properties define constraints define

63 Defining Classes and a Class Hierarchy
Things to remember: There is no single correct class hierarchy But there are some guidelines The question to ask: “Is each instance of the subclass an instance of its superclass?”

64 Siblings in a Class Hierarchy
All the siblings in the class hierarchy must be at the same level of generality Compare to section and subsections in a book

65 The Perfect Family Size
If a class has only one child, there may be a modeling problem If the only Red Burgundy we have is Côtes d’Or, why introduce the subhierarchy? Compare to bullets in a bulleted list

66 The Perfect Family Size (II)
If a class has more than a dozen children, additional subcategories may be necessary However, if no natural classification exists, the long list may be more natural

67 Single and Plural Class Names
A “wine” is not a kind-of “wines” A wine is an instance of the class Wines Class names should be either all singular all plural Class Instance instance-of

68 Classes and Their Names
Classes represent concepts in the domain, not their names The class name can change, but it will still refer to the same concept Synonym names for the same concept are not different classes Many systems allow listing synonyms as part of the class definition

69 A Completed Hierarchy of Wines

70 When to introduce a new class?
Subclasses of a class usually have Additional properties Additional slot restrictions Participate in different relationships Subclasses of a class have New slots New facet values

71 But In terminological hierarchies, new classes do not have to introduce new properties

72 A new class or a property value?
Do concepts with different slot values become restrictions for different slots? How important is the distinction for the domain? A class of an instance should not change often

73 Metaclasses: Templates For Class Definitions
Metaclasses enable us to add attributes to class definitions By default, we have: Class name Documentation Slots

74 Metaclasses (II) Additional attributes: Synonyms UMLS CUI Latin name
Other class-level properties

75 Best Wineries

76 Back to the Slots: Allowed Values
DOMAIN RANGE class slot allowed values When defining a domain or range for a slot, find the most general class or classes Consider the produces slot for a Winery: Range: Red wine, White wine, Rosé wine Range: Wine Consider the flavor slot Domain: Red wine, White wine, Rosé wine Domain: Wine

77 Defining Domain and Range
A class and a superclass – replace with the superclass All subclasses of a class – replace with the superclass Most subclasses of a class – consider replacing with the superclass

78 Inverse Slots Maker and Producer are inverse slots

79 Inverse Slots (II) Inverse slots contain redundant information, but
Allow acquisition of the information in either direction Enable additional verification Allow presentation of information in both directions The actual implementation differs from system to system Are both values stored? When are the inverse values filled in? What happens if we change the link to an inverse slot?

80 Default Values Default value – a value the slot gets when an instance is created A default value can be changed The default value is a common value for the slot, but is not a required value For example, the default value for wine body can be FULL

81 Limiting the Scope An ontology should not contain all the possible information about the domain No need to specialize or generalize more than the application requires No need to include all possible properties of a class Only the most salient properties Only the properties that the applications require

82 Limiting the Scope (II)
Ontology of wine, food, and their pairings probably will not include Bottle size Label color My favorite food and wine An ontology of biological experiments will contain Biological organism Experimenter Is the class Experimenter a subclass of Biological organism?

83 BREAK

84 Outline Ontology development basics
Medical Informatics ontologies: examples and design decisions Foundational Model of Anatomy (FMA) Gene Ontology (GO) Health Level 7 (HL7) Data Types and Top-Level RIM Classes Guideline Interchange Format (GLIF) Additional resources: Protégé plugins and applications

85 Foundational Model of Anatomy (FMA)
Developed at University of Washington as part of the Digital Anatomist project Represents declaratively knowledge about human anatomy Canonical Independent of a specific viewpoint Machine-readable, symbolic representation

86 FMA in Protégé Represents structures ranging fro macromolecular complexes to body parts Contains ~70,000 distinct concepts ~ 110,000 terms 140 relations

87 FMA: Knowledge-Model Features
Metaclasses to define class-level properties Attributed relations Different types of part-whole, location, and other spatial relations Synonyms

88 FMA: Demo Top-level distinctions: Structural organization Example:
Physical vs Conceptual entity Material vs Non-Material Physical entity Anatomical Structure Structural organization Example: Esophagus Cell Cell part Body part Organ system Organ Organ part Human body Macromolecule

89 Outline Ontology development basics
Medical Informatics ontologies: examples and design decisions Foundational Model of Anatomy (FMA) Gene Ontology (GO) Health Level 7 (HL7) Data Types and Top-Level RIM Classes Guideline Interchange Format (GLIF) Additional resources: Protégé plugins and applications

90 Gene Ontology (GO) A controlled vocabulary for describing genes and gene products Has three organizing components: Molecular function Biological process Cellular component An annotation links gene or gene product to several of the GO components

91 Outline Ontology development basics
Medical Informatics ontologies: examples and design decisions Foundational Model of Anatomy (FMA) Gene Ontology (GO) Health Level 7 (HL7) Data Types and Top-Level RIM Classes Guideline Interchange Format (GLIF) Additional resources: Protégé plugins and applications

92 HL7 ANSI-accredited standard development organization
Produce standards for clinical and administrative data in medicine Version 2.x messaging standard widely used Version 3 message-development methodology Reference Information Model Shared information structure and data types Integrated vocabulary

93 RIM Core Classes Act Relationship Role Relationship Entity Role
0..* Role Relationship 0..* 0..* 0..* 1 1 1 1 Entity Role Participation Act 0..* 0..* 1 1 1 0..* Procedure Observation SubstanceAdm Financial act Referral Encounter Supply WorkingList ActContext Organization Living Subject Material Place Health Chart Patient Employee Practitioner Assigned Practitioner Specimen

94 Representing HL7 RIM in Protégé-2000
HL7 data types as Protégé classes Terminological structures (ConceptDescriptor) as Protégé metaclasses Attributes and associations as slots Restrictions on attributes as facet constraints

95 Outline Ontology development basics
Medical Informatics ontologies: examples and design decisions Foundational Model of Anatomy (FMA) Gene Ontology (GO) Health Level 7 (HL7) Data Types and Top-Level RIM Classes Guideline Interchange Format (GLIF) Additional resources: Protégé plugins and applications

96 Guideline Interchange Format (GLIF)
Product of Intermed project Collaboration among Columbia, Harvard, Stanford A format for sharing clinical guidelines independent of platforms and systems Design to support multiple vocabularies and medical knowledge bases Designed to work with different patient information model

97 …GLIF Model Flowchart representation of a temporal sequence of clinical steps Guideline name author Guideline Step Has parts Has specializations Action Step Decision Step Branch Step Synchronization Step Patient State Step

98 GLIF in Protégé-2000

99 Outline Ontology development basics
Medical Informatics ontologies: examples and design decisions Additional resources: Protégé plugins and applications Knowledge-driven applications Reasoning services Visualization Search and navigation Ontology management

100 Knowledge-Driven Applications
Protégé-2000 knowledge base is accessible through API Protégé-2000 GUI application uses the same API Protégé-2000 classes, instances, slots, and facets are instances of Java Cls, Instance, Slot, and Facet interfaces Java applications make use of protege.jar just like any other program library Application program can be embedded in Protégé GUI application as a tab

101 Knowledge-Driven Applications: Athena
Stanford/VA DSS based on hypertension guideline Installation at VA clinics in northern California and N. Carolina Protégé tab version as debugging tool

102 Reasoning Services: Jess Rule-Based Programming
JessTab: integrate Jess with Protégé-2000 Protégé instances mapped to facts in Jess and facts mapped to instances Changes to mapped facts in Jess reflected in Protégé; changes in Protégé reflected in Jess (defrule R4 " " (object (is-a Assertion) (concept "renal_abnormality") (value TRUE)) => (make-instance (str-cat Assertion (gensym*)) of Assertion (concept "abnormal_urologic_anatomy")(value TRUE)) ) protégé class protégé slots creating protégé instance

103 Demonstration of JessTab
Application: MiniMycin to diagnose infection disease Ontology: Assertion and Identity

104 Reasoning Services Protégé Axiom Language (PAL) Clips Algernon Prolog
F-Logic

105 Visualization: Jambalaya

106 Visualization: OntoViz

107 Search and Navigation

108 Search and Navigation

109 Ontology Management: PromptDiff

110 Where To Go From Here Protégé web site: http://protege.stanford.edu
Documentation User’s Guide Tutorial protege-discussion mailing list Ontology library Contribute ontologies and plugins


Download ppt "Developing Medical Informatics Ontologies with Protégé"

Similar presentations


Ads by Google