Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS.

Slides:



Advertisements
Similar presentations
Medical Subject Headings (MeSH) Dr. Yan Ma Medical Subject Headings MeSH is designed and used by the National Library of Medicine. It was first based.
Advertisements

Searching Pubmed Database استخدام قاعدة المعلومات Pubmed د. سيناء عبد المحسن العقيل قسم الصيدلة الإكلينيكية برنامج مهارات البحث العلمي.
Nervous System Sports Training and Physiology Kociuba lic=1&article_set=59295&cat_id=20607.
The NLM Controlled Vocabulary Medical Subject Headings (MeSH) PubMed for Trainers, Spring 2015 U.S. National Library of Medicine (NLM) and NLM Training.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
1 Discussion Class 12 Medical Subject Headings (MeSH) and Unified Medical Language System (UML)
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
I256 Applied Natural Language Processing Fall 2009 Lecture 14 Information Extraction (2) Barbara Rosario.
Battling Scylla and Charybdis: The Search for Redundancy and Ambiguity in the 2001 UMLS Metathesuarus James J. Cimino Department of Medical Informatics.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
1 Noun compounds (NCs) Any sequence of nouns that itself functions as a noun asthma hospitalizations asthma hospitalization rates health care personnel.
VT. From Basic Formal Ontology to Medicine Barry Smith and Anand Kumar.
1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley.
Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI
1 Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy Barbara Rosario, Marti Hearst SIMS, UC Berkeley.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
MAJOR ORGAN SYSTEMS IN THE HUMAN BODY
Cancer Card Game Answers etc.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Pancreatic Cancer By: Austin LaRocca & Justice Davila.
Introduction to Nuclear Medicine
The Science of Anatomy and Physiology. Anatomy the study of internal and external structures and the physical relationship between body parts Greek –
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture Notes by Neşe Yalabık Spring 2011.
Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of.
Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center.
Table of Contents Health Science and Technology Education A PPLIED E DUCATIONAL S YSTEMS Introduction to Anatomy and Physiology.
By: Austin LaRocca & Justice Davila
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2005 May 16 & 17, 2005 Rachel Kleinsorge.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA Experiences in visualizing and navigating biomedical.
Betsy L. Humphreys Betsy L. Humphreys Associate Director for Library Operations NLM, NIH, HHS NLM, NIH, HHS National Library.
The Descent of Hierarchy, and Selection in Relational Semantics* Barbara Rosario, Marti Hearst, Charles Fillmore UC Berkeley *with apologies to Charles.
Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.
1 st June 2006 St. George’s University of LondonSlide 1 Using UMLS to map from a Library to a Clinical Classification: Improving the Functionality of a.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
AMA Anatomy & Physiology/Medical Terminology/Pathology 1 Introduction.
Recent advances in the field of Family Medicine classifications ICPC into WHO-FIC J K Soler Wonca International Classification Committee.
UMLS Unified Medical Language System. What is UMLS? A Unified knowledge representation system Project of NLM Large scale Distributed First launched in.
Knowledge-Based Semantic Interpretation for Summarizing Biomedical Text Thomas C. Rindflesch, Ph.D. Marcelo Fiszman, M.D., Ph.D. Halil Kilicoglu, M.S.
Health Research in Thailand: A Gap Analysis Krit Pongpirul, MD. International Health Policy Program (IHPP-Thailand)
Searching PubMed® TTUHSC Preston Smith Library presents Rev. 04/03/13.
Use of the UMLS in Patient Care James J. Cimino, M.D. Center for Medical Informatics Columbia University.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Organization of Life. Levels of Organization The different branches of biology can be classified based on different levels of complexity Each “level”
Unit 5 Ch 6: Nomenclatures and Classification Systems Tuesday, April 5 th at 8PM EST HS Adrienne Palmer, BSPH, MHA, FACHE.
Intro to Anatomy & Physiology First…. Brief History – Andreas Vesalius.
The Gene Ontology and its insertion into UMLS Jane Lomax.
By: Austin, Destiny and Dylan. Slide 3-4:The Two Major Organs Slide 5: Working Together Slide 6-7: Two Systems That Connect To The Nervous System Slide.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
INFO Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
Mining the Biomedical Research Literature Ken Baclawski.
1 Semantic Relations for Interpreting DNA Microarray Data and for Novel Hypotheses Generation Dimitar Hristovski, 1 PhD, Andrej Kastrin, 2 Borut Peterlin,
The UMLS Semantic Network Alexa T. McCray Center for Clinical Computing Beth Israel Deaconess Medical Center Harvard Medical School
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
Medical Subject Headings (MeSH)
MeSH: Medical Subject Headings Anne Allen, Heather Braum, Paula Davidson, Ellen Rose LI 804: Organization of Information.
Omega Best Cancer Hospital - India
Human Anatomy Body Systems.
1 - Intro to the Human Body
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Anatomy and Physiology
MAJOR ORGAN SYSTEMS IN THE HUMAN BODY
The Descent of Hierarchy, and Selection in Relational Semantics*
MAJOR ORGAN SYSTEMS IN THE HUMAN BODY
Marti Hearst Associate Professor SIMS, UC Berkeley
Presentation transcript:

Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS

Semantic Interpretation of Medical Text More accurate representation of the content of the input text Enhance text with information (concept, relationships) drawn from a medical knowledge source Determine semantic meaning of the words (and bigger constructs) and the relationships between them.

Combine Statistical and Symbolic Methods Use of knowledge bases, semantic hierarchies, medical knowledge, rules Use of statistic methods and machine learning techniques

Statistical methods Disambiguation Detection of semantic patterns Classification of semantically related constructs Degrees (weights, probabilities)

First Experiment: Noun Compounds and MeSH Interpretation of noun compounds is crucially semantic Noun compounds extracted from a collection of titles and abstracts of medical journals found in Medline MeSH (Medical Subject Headings) concepts for the labels

Input: Medline Text File Preprocessing Tagger Noun Compound Extraction Semantic Labeling Output: Semantic Labelled Noun Compounds MeSH

MeSH Tree Structures (main) 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

MeSH Tree Structures (node A expanded) 1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02] + Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07] + Nervous System [A08] + Sense Organs [A09] + Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] + Hemic and Immune Systems [A15] + Embryonic Structures [A16] + Body Regions [A01] Abdomen [A01.047] Groin [A ] Inguinal Canal [A ] Peritoneum [A ] + Retroperitoneal Space[A ] Umbilicus [A ] Axilla [A01.133] Back [A01.176] + Breast [A01.236] + Buttocks [A01.258] Extremities [A01.378] + Head [A01.456] + Neck [A01.598] Pelvis [A01.673] + Perineum [A01.719] Skin [A01.835] + Thorax [A01.911] + Viscera [A01.960]

Mapping Nouns to MeSH Concepts Ex: migraine headache recurrence migraine C C C headache C C C recurrence C

More Nouns Compounds migraine headache recurrence C C C blood plasma perfusion A A E migraine headache pain C C G brain stem neurons A E A rat liver mitochondria B A A plasma arginine vasopressin A D D rat thyroid cells B A A11 growth hormone secretion G D A blood urea nitrogen A D D breast cancer cells A C04 A11 cancer cell lines C04 A11 G

Attachment and Semantic Interpretation Attachment classification “acute migraine treatment” [[N N] N] (LA) “intra-nasal migraine treatment” [N [N N]] (RA) To bootstrap semantic interpretation Decision tree (Quinlan )

Levels of Descriptions migraine headache recurrence (LA) C C C Feature vector Only TreeC, C, C Level 1C, 10, C, 23, C, 23 Level 2C, , C, , C, Level 3C, , C, , C, Level 4C, , C, , C,

Decision Tree Classification Training before pruning Training after pruning Testing before pruning Testing after pruning Only Tree 15.8 %16.4%17.3% Level %11.8%15.4 % Level 2 7.9%8.6%21.2%17.3% Level 3 7.9%10.5%26.9%17.3% Level 4 8.6%9.9%25.0%19.2%

Expressiveness of Decision Trees first noun tree = B: ra (33.0/3.7) first noun tree = E: ra (2.0/1.6) first noun tree = F: la (0.0) first noun tree = G: la (4.0/0.3) first noun tree = A: | second noun tree = B: la (0.0) | second noun tree = D: la (4.0/0.3) | second noun tree = E: la (10.0/0.4) | second noun tree = F: la (0.0) | second noun tree = G: la (6.0/1.6) | second noun tree = A: | | first tree position <= 4 : ra (7.0/1.6) | | first tree position > 4 : la (36.0/5.8) | second noun tree = C: | | third noun tree = A: ra (9.0/0.3) | | third noun tree = B: la (0.0) | | third noun tree = D: la (1.0/0.3) | | third noun tree = E: la (5.0/0.3) | | third noun tree = F: la (0.0) | | third noun tree = G: ra (2.0/1.6) | | third noun tree = C: | | | third tree position <= 21 : ra (5.0/2.6) | | | third tree position > 21 : la (5.0/0.3) first noun tree = C: …..

Semantic Interpretation Use decision tree paths for the detection of clusters of noun compounds with the same semantic interpretation

Ex: ACA: breast cancer cells A C04 A11 ra bladder cancer cells A C04 A11 ra colon carcinoma cells A C A11 ra prostate tumor cells A C04 A11 ra prostate cancer tissue A C04 A10 ra lung cancer cells A C04 A11 ra colon cancer cells A C04 A11 ra brain tumor tissue A C04 A10 ra colon cancer tissues A C04 A10 ra bladder tumor cells A C04 A11 ra Interpretation: noun3 exhibits noun2 in noun1

Ex: ACE: muscle disease diagnosis A C E01 la breast cancer prognosis A C04 E la breast cancer treatment A C04 E02 la hip fracture treatment A C E02 la cell cancer treatment A11 C04 E02 la brain tumor treatment A C04 E02 la colon adenocarcinoma xenograft A C E colon carcinoma xenograft A C E colon carcinoma xenografts A C E neck cancer xenografts A C04 E Interpretation: 1: noun3 diagnoses noun2 in noun1 2: noun3 treats noun2 in noun1

From MeSH to UMLS Unified Medical Language System, project at U.S National Library of Medicine 3 UMLS Knowledge Sources Metathesaurus Semantic Network SPECIALIST lexicon and programs

Metathesaurus Most extensive of UMLS sources 730,000 concepts representing more then 1,500,000 strings in over 60 vocabularies and classifications Organized by concept or meaning. In essence, its purpose is to link alternative names and views of the same concept together and to identify useful relationships between different concepts. Relationships in the Metathesaurus come from the sources themselves or are created by the Metathesaurus editors.

Semantic Network Consistent categorization of all concepts represented in the UMLS Metathesaurus and the important relationships between them. Every concept has been assigned a semantic type. The semantic types (134) are the nodes in the Network, and the relationships between them are the links (54) High level semantic structure

"Biologic Function" Hierarchy

Noun Compounds, again Very preliminary studies… Can we use the information of the Semantic Net for the semantic interpretation on the noun compounds? Are semantic types and relationships good descriptors? Are they useful for disambiguation and classification?

Mapping of Noun Compounds NC: peptide CRF receptor antagonists C |C |C |C | Amino Acid, Peptide, or Protein|Hormone|Receptor|Pharmacologic Substance| A |A |A |A | rel_12.1 (Amino Acid, Peptide, or Protein, Hormone) = interacts_with: A R3.1.5 A rel_13.1 (Amino Acid, Peptide, or Protein, Receptor) = interacts_with: A R3.1.5 A rel_14.1 (Amino Acid, Peptide, or Protein, Pharmacologic Substance) = interacts_with: A R3.1.5 A rel_23.1 (Hormone, Receptor) = interacts_with: A R3.1.5 A rel_24.1 (Hormone, Pharmacologic Substance) = interacts_with: A R3.1.5 A rel_34.1 (Receptor, Pharmacologic Substance) = interacts_with: A R3.1.5 A

Mapping of Noun Compounds NC: day hospital treatment C |C |C ,C | Temporal Concept|Health Care Related Organization|Functional Concept;Therapeutic or Preventive Procedure| A2.1.1|A2.7.1|A2.1.4;B | rel_12.1 (Temporal Concept, Health Care Related Organization) = NOT found in SemNet rel_13.1 (Temporal Concept, Functional Concept) = NOT found in SemNet rel_13.2 (Temporal Concept, Therapeutic or Preventive Procedure) = NOT found in SemNet rel_23.1 (Health Care Related Organization, Functional Concept) = NOT found in SemNet rel_23.2 (Health Care Related Organization, Therapeutic or Preventive Procedure) = location_of: R2.1

Mapping of Noun Compounds NC: brain serotonin metabolism C |C |C ,C | Body Part, Organ, or Organ Component|Neuroreactive Substance or Biogenic Amine|Organism Function;Functional Concept| A |A |B ;A2.1.4| rel_12.1 (Body Part, Organ, or Organ Component, Neuroreactive Substance or Biogenic Amine) = produces R3.2.1 rel_13.1 (Body Part, Organ, or Organ Component, Organism Function) = location_of R2.1 rel_13.2 (Body Part, Organ, or Organ Component, Functional Concept) = NOT found in SemNet rel_23.1 (Neuroreactive Substance or Biogenic Amine, Organism Function) = disrupts R3.1.3 rel_23.2 (Neuroreactive Substance or Biogenic Amine, Functional Concept) = NOT found in SemNet

Mapping Words - Semantic Types, Semantic Relationships Semantic types correctly assigned (on 246 nc, 738 nouns): 59% Semantic types disambiguated by the relationships Doesn’t disambiguate: 42.7% Disambiguates wrong: 17.3% Disambiguates correctly: 40%

(Some of) Future Work Explore in more depth UMLS sources What form the best basis for automatic semantic interpretation of noun phrases? Semantic types? Metathesaurus concepts?(and what parts of them) Just MeSH concepts? Machine Learning algorithms to help choose a good representation of medical terms

Future Work Machine learning algorithms for classification Can we (and how) generalize patterns found for noun compounds to other syntactic structures? How can we best formally represent semantics? How can we combine symbolic rules with statistical methods? How can we deal with non medical words? Can the system help us disambiguate them? Should we use other ontologies (ex WordNet)?