1 Enriching and Designing Metaschemas for the UMLS Semantic Network Department of Computer Science New Jersey Institute of Technology Yehoshua Perl James.

Slides:



Advertisements
Similar presentations
Experience with Using the UMLS Semantic Network to Coordinate Controlled Terminologies for a Large Clinical Data Repository James J. Cimino Department.
Advertisements

Database Systems: Design, Implementation, and Management Tenth Edition
Ontological analysis of the semantic types Anand Kumar MBBS, PhD IFOMIS, University of Saarland, Germany. BIOMEDICALONTOLOGYBIOMEDICALONTOLOGY.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
Chapter 4 Enterprise Modeling.
Systems Analysis and Design 9th Edition
Summary Issues and Suggestions Workshop on The Future of the UMLS Semantic Network NLM, April 8, 2005 Olivier Bodenreider Lister Hill National Center for.
The Role of Foundational Relations in the Alignment of Biomedical Ontologies Barry Smith and Cornelius Rosse.
FMA: a domain reference ontology Comments on Cornelius Rosse’s talk Anita Burgun WG6 meeting, Rome 29 Apr- 2 May 2005.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Battling Scylla and Charybdis: The Search for Redundancy and Ambiguity in the 2001 UMLS Metathesuarus James J. Cimino Department of Medical Informatics.
Chapter 7 Using Data Flow Diagrams
CSE 222 Systems Programming Graph Theory Basics Dr. Jim Holten.
VT. From Basic Formal Ontology to Medicine Barry Smith and Anand Kumar.
Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
1 Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction Zequian shen, Kwan-Liu Ma, Tina Eliassi-Rad Department.
The Science of Life Biology unifies much of natural science
Automatic methods for functional annotation of sequences Petri Törönen.
BASIC BIOCHEMISTRY MLAB Introduction. INTRODUCTION TO BASIC BIOCHEMISTRY Biochemistry can be defined as the science concerned with the chemical.
1 The Refined Semantic Network James Geller Yehoshua Perl New Jersey Institute of Technology.
1 09/12/ Abstraction Networks for Terminologies Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ USA
Chapter 7 Using Data Flow Diagrams
Database Design - Lecture 2
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA Experiences in visualizing and navigating biomedical.
Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 2/1 Copyright © 2004 Please……. No Food Or Drink in the class.
Object Oriented Analysis & Design & UML (Unified Modeling Language)1 Part V: Design The Design Workflow Design Classes Refining Analysis Relationships.
Agent-based methods for translational cancer multilevel modelling Sylvia Nagl PhD Cancer Systems Science & Biomedical Informatics UCL Cancer Institute.
CS3773 Software Engineering Lecture 04 UML Class Diagram.
UMLS Unified Medical Language System. What is UMLS? A Unified knowledge representation system Project of NLM Large scale Distributed First launched in.
Ontological Foundations of Biological Continuants Stefan Schulz, Udo Hahn Text Knowledge Engineering Lab University of Jena (Germany) Department of Medical.
Use of the UMLS in Patient Care James J. Cimino, M.D. Center for Medical Informatics Columbia University.
1 Structuring Systems Requirements Use Case Description and Diagrams.
Consistency between Metathesaurus and Semantic Network Workshop on The Future of the UMLS Semantic Network NLM, April 8, 2005 Olivier Bodenreider Lister.
The Gene Ontology and its insertion into UMLS Jane Lomax.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Using Meta-Model-Driven Views to Address Scalability in i* Models Jane You Department of Computer Science University of Toronto.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
Object Oriented Analysis: Associations. 2 Object Oriented Modeling BUAD/American University Class Relationships u Classes have relationships between each.
Introduction of Pathology
Object-Oriented Modeling: Static Models. Object-Oriented Modeling Model the system as interacting objects Model the system as interacting objects Match.
Winter 2011SEG Chapter 11 Chapter 1 (Part 1) Review from previous courses Subject 1: The Software Development Process.
The UMLS Semantic Network Alexa T. McCray Center for Clinical Computing Beth Israel Deaconess Medical Center Harvard Medical School
Systems Analysis and Design 8th Edition
Experience with Using the UMLS Semantic Network to Coordinate Controlled Terminologies for a Large Clinical Data Repository James J. Cimino Department.
More about proteins Proteins are the building block of our bodies. They make up many components (muscle, skin) or direct the synthesis of components (bone,
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
ITEC0724 Modern Related Technology on Mobile Devices Lecture Notes #2 1.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
1 SWE Introduction to Software Engineering Lecture 14 – System Modeling.
1 Semantic Network Issues in UMLS Study Yehoshua Perl, James Geller.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
UNIT-IV Designing Classes – Access Layer ‐ Object Storage ‐ Object Interoperability.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
The UMLS and the Semantic Web
Conceptual Design & ERD Modelling
STRING Large-scale data and text mining
Program comprehension during Software maintenance and evolution Armeliese von Mayrhauser , A. Marie Vans Colorado State University Summary By- Fardina.
Department of Genetics • Stanford University School of Medicine
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Software Construction Lecture 2
Ontological analysis of the semantic types
Understand and Use Object Oriented Methods
GPX: Interactive Exploration of Time-series Microarray Data
Trees 11.1 Introduction to Trees Dr. Halimah Alshehri.
Cs212: Data Structures Lecture 7: Tree_Part1
Presentation transcript:

1 Enriching and Designing Metaschemas for the UMLS Semantic Network Department of Computer Science New Jersey Institute of Technology Yehoshua Perl James Geller

2 Problem 1 Problem 1: the SN’s tree structure is restrictive since it does not allow multiple parents because each semantic type has at most one parent in the current SN. Example:Gene or Genome –Current parent: Fully Formed Anatomical Structure –Fact: Gene or genome is also a kind of Molecular Sequence. –Result: this subsumption knowledge is omitted.

3 Problem 1 (cont’d) Disadvantages: We have no direct access to the subsumption knowledge. We have difficulties in reasoning and decision making. The relationship modeling for Gene or Genome is limited, because it cannot inherit valid relationships from Molecular Sequence.

4 Problem 2 The SN is very complex, due to many relationships, making it difficult for user orientation. 135 semantic types 133 IS-A relationships About 7,000 semantic relationship occurrences It is difficult to gain knowledge from the picture of the SN. The following page shows about 1/4 of the SN with many relationships abbreviated by numbers.

5

6 Proposed Solutions For the problem of SN’s restrictive structure Expand the SN into a multiple subsumption structure with a Directed Acyclic Graph (DAG) hierarchy. Called the Enriched Semantic Network (ESN) Accommodates multiple inheritance of semantic relationships. For the problem of SN’s (ESN’s) comprehension Create a Metaschema as a higher-level abstraction of SN (do the same thing for ESN). The role of the Metaschema for the SN is similar to the role of the SN for the underlying META.

7 Problem1: Expand the SN to the ESN Objective: Expand the SN from two trees to a DAG Methods: Identify viable IS-A links by imposing connectivity on a partition of the SN [McCray, Burgun, Bodenreider, MedInfo ’ 01] Identify viable IS-A links by string matching between semantic types’ names and definitions.

8 Method 1: Imposing Connectivity [McCray, Burgun, Bodenreider, MedInfo ’ 01] presented a partition of the SN consisting of 15 groups of semantic types. The partition is based on a semantic approach : externally identify subject areas place semantic types in areas Six principles for a partition are presented: One of them is Semantic Validity: the groups must be semantically coherent.

9 Semantic Validity Judging semantic validity: We check whether the types in a group are hierarchically related to each other (by IS-A links) to form a connected subgraph of the SN (“Connectivity Property”). Because the SN’s IS-A hierarchy consists of two trees, such a connected subgraph in the current SN must form a tree with a unique root.

10 Semantic Validity (cont’d) Some groups are disconnected. They have multiple roots so that not all semantic types in the groups are subsumed under one category. E.g.: Genes and Molecular Sequences group T085 Molecular Sequence T088 Carbohydrate Sequence T087 Amino Acid Sequence T086 Nucleotide Sequence T028 Gene or Genome

11 Identify IS-A based on Imposing Connectivity Step 1: Analyze disconnected groups in the partition. Step 2: (a)Convert each disconnected group into a new connected group (sometimes several connected groups). (b)Identify viable IS-A links during the conversion procedure. (c)Present 4 kinds of transformations: IS-A addition, Root-addition, Split, and Root-moving.

12 Four Transformations (1) “IS-A Addition” Transformation Identify and add IS-A links to transform a disconnected group into a connected one. (2) “Root-addition” Transformation Create a new semantic type that will be an ancestor of all roots in the group. Disconnected group must have multiple roots, so we need to make these roots subsumed under one common category. Make the new semantic type a root of the new group by adding additional IS-A links to it from all roots in the group.

13 (3) “Split” Transformation Split a group into several smaller connected groups. Each of the smaller groups is either a tree or can be transformed into a tree by using other transformations. (4) “Root-moving” Transformation Find the lowest common ancestor of all roots of the disconnected group. Make this lowest common ancestor the root of the new group. Four Transformations (cont’d)

14 Root-addition Transformation Example

15 We utilized the analysis of anatomy concepts of the Digital Anatomist Foundational Model (DAFM). DAFM was developed at the U. of Washington [C. Rosse, et al. Amia ‘95, Jamia ‘98] Root-addition Transformation Example

16 Anatomical Entity Group T017 Anatomical Structure T030 Body Space or Junction T022 Body System T023 Body Part, Organ, or Organ Component T026 Cell Component T025 Cell T029 Body Location or Region T024 Tissue T021 Fully Formed Anatomical Structure T018 Embryonic Structure T031 Body Substance

17 T046 Pathologic Function T191 Neoplastic Process T048 Mental or Behavioral Dysfunction T049 Cell or Molecular Dysfunction T047 Disease or Syndrome T050 Experimental Model or Disease T190 Anatomical Abnormality T020 Acquired Abnormality T184 Sign or Symptom T019 Congenital Abnormality T033 Finding Pathologic Function Anatomical Abnormality Finding IS-A addition and Split Transformation Example

18 Method 2: String Matching Definition (CP-pair): a pair (T 1 ; T 2 ) is a CP-pair if T 1 is a child of T 2 Definition (String match): A string match from a semantic type T 1 to another semantic type T 2 is a triple (T 1 ; T 2 ; S) such that S is a string appearing both in the definition of T 1 and in the name of T 2. S is called the common string. In the definition, lexical normalization is used to convert adjectives and other formats to noun format.

19 Observation Observation: among the 133 CP-pairs of semantic types, 88 have matches from children to their respective parents. If there is a match from one semantic type to another not connected by IS-A path, then it may imply an IS-A relationship between them. Method: Find string matches between any two semantic types having no IS-A path between them.

20 Enzyme : a complex chemical, usually a protein, that is produced by living cells and which catalyzes specific biochemical reactions Three matches: (Enzyme; Amino Acid, Peptide, or Protein; “protein”) (Enzyme; Cell; “cell”) (Enzyme; Cell Component; “cell”) The match between Enzyme and Chemical is not considered, because Chemical is an ancestor of Enzyme in the SN. Viable IS-A: Enzyme IS-A Amino Acid, Peptide, or Protein Example

21 Matching Results All matches were reviewed by a domain expert There are only a few valid matches that indicate new viable IS-A links (5): Enzyme IS-A Amino Acid, Peptide, or Protein Receptor IS-A Cell Component Vitamin IS-A Pharmacologic Substance Vitamin IS-A Organic Chemical Gene or Genome IS-A Molecular Sequence

22

23 ESN’s Relationship Structure ESN is different from SN: –Allows semantic type to inherit more relationships from its new parent (“multiple inheritance”). –Has 21 semantic types having multiple parents/ancestors –Expands the relationship model of these 21 types ESN’s relationship structure: –Preserves existing relationships in the SN (6,977) –Includes new relationships inherited from new parents/ancestors

24 Observations: New relationships come from the four new semantic types or semantic types having multiple parents or ancestors. –4 new semantic types, 12 new relationships for them –414 newly inherited relationships involving the 21 semantic types having multiple parents/ancestors. –Question: are all the 414 relationships valid? –For each of the 21 semantic types, we checked the validity of the new relationships inherited from its new parent/ancestor. Validity of Newly Inherited Relationships

25 Validity Check Example For example: –Injury or Poisoning has new parent Disease or Syndrome. –It has 112 new relationships inherited from Disease or Syndrome. –After review, 92 are valid and retained in the ESN, 20 are invalid and blocked in the ESN.

26 Among the 414 newly inherited relationships, 314 are valid and inherited by 12 semantic types, 100 are invalid. Only seven blockings suffice to prevent 100 invalid relationships. The ESN has 7,303 (6, ) relationship occurrences. Among the 139 semantic types in the ESN, 16 (12+4 new) have different relationship structures. ESN relationship Structure Summary

27 ESN Summary ESN’s IS-A hierarchy: 139 semantic types, 150 IS-A links 21 semantic types have multiple parents/ancestors ESN’s relationship structure: 7,303 semantic relationship occurrences (5% more)

28 Problem 2: SN/ESN’s comprehension The SN is still too hard to understand. There are 135 semantic types, 133 IS-A links About 7,000 semantic relationships (6977) Solution: Build a higher-level abstraction for the SN/ESN. Referred to as a Metaschema

29 Metaschema

30 Metaschema Requirements and Derivation Metaschema: A set of meta-semantic types (MSTs) Hierarchical meta-child-of relationships between MSTs Meta-relationships between MSTs A Metaschema of the SN (ESN) will represent a partition of the SN (ESN).

31 Metaschema Requirements and Derivation (cont’d) Procedure to build metaschema: Step 1: Partition the SN (ESN) into disjoint semantic-type groups. Step 2: Define a meta-semantic type (MST) to represent each semantic-type group. Step 3: Derive hierarchical meta-child-of relationships between meta-semantic types. Step 4: Derive meta-relationships between meta- semantic types.

32 Partition Example

33 Metaschema example

34 Meta-relationship Example meta-relationship example

35 ESN’s two metaschemas Q -metaschema (Qualified Metaschema) Basis: the partition of 19 disjoint semantic-type groups obtained when we expanded the SN to the ESN [Zhang, JBI 2003] C-metaschema (Cohesive Metaschema) Basis: cohesive partition which partitioned all semantic types exhibiting the same relationship set into one semantic type group [M. Halper, et al. Amia 2001][Perl JBI 2003]

36 Q-metaschema hierarchy

37 Q-metaschema including meta-relationships

38 C-metaschema hierarchy