Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The Refined Semantic Network James Geller Yehoshua Perl New Jersey Institute of Technology.

Similar presentations


Presentation on theme: "1 The Refined Semantic Network James Geller Yehoshua Perl New Jersey Institute of Technology."— Presentation transcript:

1 1 The Refined Semantic Network James Geller Yehoshua Perl New Jersey Institute of Technology

2 2 Sources This presentation is based: on a pending proposal on Auditing and Extending the UMLS; [Gu et al., JAMIA 2000] and [MEDINFO YEARBOOK 2001]

3 3 Fundamental Observation The UMLS requires that there is an assignment of one or several Semantic Types to each concept. This assignment provides semantics for concepts. We call the set of all concepts to which a Semantic Type S has been assigned the Extent of S.

4 4 Problems (1) Assigning Semantic Types to new concepts is a complex manual task due to complexity, ambiguity and homonymy of medical concepts. Categorization is highly dependent on an editor’s specialty, background and priorities and thus not fully predictable.

5 5 Problems (2) The extents of most Semantic Types are not uniform. If we look at the extent of a Semantic Type it may contain concepts with “second” assignments that are different from each other. A desire was expressed to make the SN deeper [McCray and Nelson 1995].

6 6 Example (concrete) We are looking at all the 61 concepts to which the Semantic Type Environmental Effect of Humans (henceforth EEH) has been assigned (I.e., the extent of EEH).

7 7 Are these concepts similar? Are Classroom environment, Sanitation problem, Acid rain, and Industrial waste really similar to each other? ONLY EEH is assigned to 54 of these 61 concepts. To 7 other concepts combinations of 3 additional Semantic Types are assigned. That’s why we say that the extent of EEH is not uniform.

8 8 EEH & Finding: 2 concepts: Poor Sanitation, Sanitation Problem EEH & Hazardous or Poisonous Substance: 4 concepts: Acid rain, Radioactive fallout, Radioactive waste, Smoke EEH & Manufactured Object & Hazardous or Poisonous Substance: 1 concept: Industrial waste Concepts of non-uniform semantics

9 9 g, h a, b, d, e, f, g b, g c, d, e, g W X Y Z Abstract Example with 4 Semantic Types. Boxes show extents.

10 10 Problem even in simple case One has to look into all boxes to see if a concept occurs in them or not. Definition: Intersection of two extents: The intersection of two extents contains all and only the concepts that occur in BOTH extents. We will use the symbol & for it. Example: Intersection of [a, b, c] & [b, c, d] --> [b, c]

11 11 a f g bd e h c X Y W Z Example of Venn Diagram for X, Y, Z, W

12 12 Our Solution to all 3 Problems Identify all existing intersections. Display every concept exactly once, in its pure “original box” or in a new “box” that corresponds to a unique intersection.

13 13 h a, f c W X Y Z W & X & Y & Z g X & Y b X & Z d, e PURE Semantic Types (simple semantics) Intersection Types (Compound Semantics) g, h a, b, d, e, f, g b, g c, d, e, g W X Y Z Original Semantic Types (non uniform Semantics)

14 14 Intersection Types Intersection Types are “new” Semantic Types that are constructed by intersection of the extents of their component Semantic Types. The “names” of Intersection Types are constructed by chaining the names of their component Semantic Types together with &-signs.

15 15 Semantic Refinement We call the process of constructing all necessary Pure Semantic Types and Intersection Types Semantic Refinement. Concepts are reassigned, so that every concept occurs only in one extent. After Semantic Refinement, every Semantic Type has a uniform extent.

16 16 Pure Semantic Types have extents of simple concepts. (They are uniform.) Intersection Types have extents of compound concepts. (They are also uniform!)

17 17 Advantages Extents of pure and intersection types now have a uniform semantics. That means, every extent contains concepts that are highly similar. Small sets of concepts are easy to review and also more suspicious. It is easier to see a concept that “does not belong” to a small set or even that a concept is “missing.”

18 18 What does this have to do with the Semantic Network? Every intersection type S of types X, Y, Z,… should be added to the Semantic Network as follows. S is made a child of several appropriate Semantic Types. We allow multiple parents. This is the Refined Semantic Network: RSN

19 19 WXYZ X & YX & Z W & X & Y & Z The Refined Semantic Network of the Semantic Types W, X, Y, Z

20 20 Thing Event Entity Phenomenon or ProcessConceptual EntityPhysical Object FindingManufactured ObjectSubstance Human-Caused Phenomenon or Process EEH Chemical Chemical Viewed Functionally Hazardous or Poisonous Substance EEH & Haz. or Poi.. Sub.EEH & FindingManu. Obj. & Haz. or Poi.. Sub. EEH & Manu. Obj. & Haz. or Poi.. Sub. Subnetwork of SN with EEH Intersections and all their ancestors

21 21 The RSN supports auditing. Auditing has helped us find mistakes in the UMLS. Removal of mistakes typically leads to simplifications of the UMLS and of the RSN itself, by removing wrong intersections.

22 22 EEH Auditing Example The intersection of the extents of three Semantic Types EEH and Manufactured Object and Hazardous or Poisonous Substance contained only one concept: Industrial Waste Industrial smog and Factory smoke are not considered Manufactured Objects, and our audit suggested that Industrial Waste should not be one either.

23 23 More strange intersections We found concepts belonging to both Human-caused phenomenon or process and Manufactured object. It is out of the question that something is at the same time a process and an object. By creating the RSN we found this. It was caused by homonyms. E.g. Video recording as the process and as its result.

24 24 Wrong Categorizations By reviewing the pure semantic types and intersection types we found various errors. Drinking water problem and PBC Airborne level are missing a Finding assignment. Smoke is assigned Hazardous or Poisonous Substance, but its subconcepts Factory smoke and Second hand smoke are missing such an assignment.

25 25 Classroom Environment and College Environment should not be assigned EEH at all. These and other errors were exposed by review of the extents, which should be semantically uniform. After correcting these errors, the concepts of EEH look very different.

26 26 Venn Diagrams before/after audit 2 3 1 4 54 EEH FindingHazardous or Poisonous Substance Manufac- tured Object 40 10 5 3 4 EEH Manufactured Object Hazardous or Poisonous Substance Finding Substance

27 27 Thing Event Entity Phenomenon or ProcessConceptual EntityPhysical Object FindingManufactured ObjectSubstance Human-Caused Phenomenon or Process EEH Chemical Chemical Viewed Functionally Hazardous or Poisonous Substance EEH & Substance EEH & Finding Manu. Obj. & Haz. or Poi.. Sub. EEH & Haz. or Poi.. Sub. Revised Subnetwork of SN with EEH Intersections and all their ancestors

28 28 Exclusive Semantic Types We found 143 concepts that are classified as both Organic Chemical and Inorganic Chemical!! Of those, 82 are assigned to additional semantic types.

29 29 Redundant Categorizations Many concepts are assigned to a Semantic Type S and the parent or ancestor T of S. This is a redundant categorization, a no-no. [McCray and Nelson, 1995][Peng et al., AMIA 2002] Sample in 1998: Desertification was assigned EEH and also PHENOMENON OR PROCESS, a redundant categorization. It was removed after our report.

30 30 Auditing simplifies the RSN After correcting the assignments of those 143 (“organic”) concepts, 13 invalid Intersection Types disappeared. The RSN becomes simpler, as it has fewer Intersection Types. In a sample of 100 intersections with only one concept, only 15 were deemed legal. [Gu, JAMIA 2000]

31 31 Renaming Intersection Types Instead of Environmental Effect of Humans & Hazardous and Poisonous Substance we rather have a designer rename it into: Environmentally Hazardous or Poisonous Substance. An intersection of Body Part & Manufactured Object is a Prosthesis.

32 32 Thing Event Entity Phenomenon or ProcessConceptual EntityPhysical Object FindingManufactured ObjectSubstance Human-Caused Phenomenon or Process EEH Chemical Chemical Viewed Functionally Hazardous or Poisonous Substance Human- produced Environmental Substance Environmental Finding Manufactured Hazardous or Poisonous Substance Environmentally Hazardous or Poisonous Substance Subnetwork with simplified names

33 33 Overall Results for UMLS 1998 LevelNumber of Pure Semantic Types at LevelNumber of Intersection Types 110 220 340 4200 54156 623203 723163 817187 92234 100212 11089 12016 1303 1401

34 34 Concept Distribution UMLS 98 Number of concepts/intersection type How many intersection types with so many concepts 1421 } Many of these will 2147 } disappear 3102 } [GU, AIM 2004] 465 535 641 732 815 913 …. 39471 45821 67051 193491 415641

35 35 Streamlining Categorizations Currently several UMLS EDITORs may assign to new concepts any combination of semantic types. Even combinations that don’t make sense. We propose that concepts may be assigned only existing pure and intersection types. If a new intersection type is desired it has to be “approved” by the NLM.

36 36 Summary (1) We propose to change the SN as follows: Allow a DAG structure, to enable intersections (with multiple parents) Create the “lower half” of the RSN by our method of Semantic Refinement. That takes care of deepening! Use various auditing techniques to eliminate all wrong intersections.

37 37 Summary (2) Rename legitimate intersection types. The RSN limits the choices of a UMLS editor to reasonable intersections. This will prevent future UMLS mistakes. The RSN streamlines categorization, making it more accurate and easier.


Download ppt "1 The Refined Semantic Network James Geller Yehoshua Perl New Jersey Institute of Technology."

Similar presentations


Ads by Google