MetaMap UMLS Concept Mapping Program Pawel Matykiewicz and Others.

Slides:



Advertisements
Similar presentations
I. Spasić,1 D. Schober,2 S. Sansone,2 D. Rebholz-Schuhmann,2 D
Advertisements

PubMed/How to Search, Display, Download & (module 4.1)
Automatic Concept Indexing and Classification for Improved Retrieval in the Hazardous Substances Data Bank Doszkocs, Tamas; Chang, Hua Florence; Aronson,
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
PubMed: Outline Coverage MeSH, mapping and subheadings Simple search Limits Displaying and managing results MeSH database Single citation matcher.
Advanced PubMed Searching for First-year PT Master Students Min-Lin E Fang, MLIS Education and Information Consultant for Nursing and Social and Behavioral.
The NLM Indexing Initiative Alan R. Aronson, PhD Lister Hill Center, National Library of Medicine American Society of Indexers Annual Meeting May 15, 2004.
PubMed for Trainers, Fall 2011 U.S. National Library of Medicine (NLM) and NLM Training Center PubMed Search Mechanics Tips & Tricks.
Indexing the Biomedical Literature in a Time of Increased Demand and Limited Resources BioASQ Workshop September 27, 2013 Alan R. Aronson Lister Hill Center,
U. S. National Library of Medicine NLM Indexing Initiative Tools for NLP: MetaMap and the Medical Text Indexer Natural Language Processing: State of the.
Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu.
Retrieval of Similar Electronic Health Records using UMLS Concept Graphs Laura Plaza and Alberto Díaz Universidad Complutense de Madrid.
NLM Medical Text Indexer (MTI) BioASQ Challenge Workshop September 27, 2013 J.G. Mork, A. Jimeno Yepes, A. R. Aronson.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
U. S. National Library of Medicine Welcome to the first MMTx User’s Group Meeting AMIA 2003 November 11, 2003.
1 Representing Meaning in Unsupervised Word Sense Disambiguation Bridget T. McInnes 5 September 2008 University of Minnesota Twin Cities.
Brian A. Carlsen Apelon, Inc. Tools For Classification Integration Networked Knowledge Organization Systems/Services Workshop June 28, 2001.
Literature Searching: Theories Related to Nursing Care of the Adult Min-Lin Fang, MLIS Education and Information Consultant for Nursing and Social and.
Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley.
Intelligent Information Directory System for Clinical Documents Qinghua Zou 6/3/2005 Dr. Wesley W. Chu (Advisor)
Social Pharmacy and Pharmacoepidemiology Lister Hill National Center for Biomedical Communications Text-based Discovery in Biomedicine The Architecture.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
HIKM’2006AMTEx Automatic Document Indexing in Large Medical Collections Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis Technical University.
Search on Journal of Dairy Science ® An Overview April
MBBS Hons 2010 Lars Eriksson HHSL Jill McTaggart Joint PA Hospital/UQ Library
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2007 National Library of Medicine National Institutes of Health U.S. Dept. of Health.
Citation Biomedical Informatics Data ➜ Information ➜ Knowledge BMI Biomedical Named Entity Recognition Ramakanth Kavuluru NLP Seminar – 8/21/2012.
A Web Application To Support Consumer Health Vocabulary Development Jon Crowell 1, Qing Zeng 1, Tony Tse 2 1 Decision Systems Group, Brigham & Women’s.
How to do a literature search Saharuddin Ahmad Aida Jaffar Department of Family Medicine.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
T29: UMLS Concept Identification Using the MetaMap System
X-ray crystallography NMR cryoEM Experimental approaches for structural biology.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2005 May 16 & 17, 2005 Rachel Kleinsorge.
NICTA Copyright 2013From imagination to impact Identifying Publication Types Using Machine Learning BioASQ Challenge Workshop A. Jimeno Yepes, J.G. Mork,
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
Betsy L. Humphreys Betsy L. Humphreys Associate Director for Library Operations NLM, NIH, HHS NLM, NIH, HHS National Library.
Searching Medline Helen Rowlandson Medicines Information Manager Northwick Park Hospital, London.
Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications.
PubMed Overview From the HINARI Content page, we can access PubMed by clicking on Search inside HINARI full-text using PubMed. Note: If you do not properly.
Part 1 – PubMed Interface, Display options, Saving, Printing, and ing results. Instructions This part of the course is a PowerPoint demonstration.
Introduction to PubMed Your Name Your contact information can go here.
Using the UMLS MetaMap as a Cause of Death Analyzer Michael Hogarth, MD Michael Resendez, MS Univ. of California, Davis.
Searching Medline Alex Denby Regional MI Manager London Medicines Information Service (Northwick Park Hospital)
Lexical Tools Briefing The Lexical Systems Group NLMNLM. LHNCBC. CGSBLHNCBCCGSB June, 2006.
Indexing UMLS concepts with Apache Lucene Julien Thibault University of Utah Department of Biomedical Informatics.
Shelly Warwick, MLS, Ph.D – Permission is granted to reproduce and edit this work for non-commercial educational use as long as attribution is provided.
CTAKES The clinical Text Analysis and Knowledge Extraction System.
UMLS Unified Medical Language System. What is UMLS? A Unified knowledge representation system Project of NLM Large scale Distributed First launched in.
Knowledge-Based Semantic Interpretation for Summarizing Biomedical Text Thomas C. Rindflesch, Ph.D. Marcelo Fiszman, M.D., Ph.D. Halil Kilicoglu, M.S.
Comparing Frequency of Content- Bearing Words in Abstracts and Texts in Articles from Four Medical Journals: An Exploratory Study September 4, 2001 James.
MedKAT Medical Knowledge Analysis Tool December 2009.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
PubMed/How to Search, Display, Download & (module 4.1)
Introduction to Josephine Tan, MLIS Education and Information Consultant, Clinical Sciences
U. S. National Library of Medicine The Current State of MetaMap and MMTx UMLS Webcast Alan (Lan) R. Aronson Lister Hill Center/NLM/NIH
DeCS/MeSH Description, uses, services, updating Visit of Isabelle Wachsmuth (WHO) and América Valdes (PAHO) BIREME, São Paulo, August 2007.
Lexical Tools Briefing The Lexical Systems Group NLMNLM. LHNCBC. CGSBLHNCBCCGSB May, 2006.
Copyright OpenHelix. No use or reproduction without express written consent1.
PubMed/How to Search, Display, Download & (module 4.1)
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
NATURAL LANGUAGE PROCESSING
Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Adult Sleep Apnea.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
PubMed Database Interface (Basic Course Module 4 Part A)
Using UMLS CUIs for WSD in the Biomedical Domain
How to Search in PubMed and ESGO Journal
PubMed Database Interface (Basic Course: Module 4 Part A)
PubMed Database Interface Part A (Basic Course Module 4)
Presentation transcript:

MetaMap UMLS Concept Mapping Program Pawel Matykiewicz and Others

Outline Input Formats (1) The Algorithm (2) MetaMap options (3) Output Formats (4) Creators (5)

Input Formats (1) ASCII only input Unformatted English free text MEDLINE Citations Input records delimited by blank line Single-line delimited input (via job Scheduler) heart attack lung cancer Single-line delimited input with ID (Scheduler) |heart attack |lung cancer

Input Should Have Syntactic Structure Lack of structure → long phrases → combinatorial explosion in mappings – protein-4 FN3 fibronectin type III domain GSH lutathione GST glutathione S-transferase hIL-6 human interleukin-6 HSA human serum albumin IC(50) half-maximal inhibitory concentration Ig immunoglobulin IMAC immobilized metal affinity chromatography K(D) equilibrium constant – from filamentous bacteriophage f1 PCR polymerase-chain reaction PDB Protein Data Bank PSTI human pancreatic secretory trypsin inhibitor RBP retinol-binding protein SPR surface plasmon resonance TrxA

Be careful of bulleted lists!

The Algorithm (2) Parsing – Using SPECIALIST minimal commitment parser, SPECIALIST lexicon, MedPost part of speech tagger Variant generation – Using SPECIALIST lexicon, Lexical Variant Generation (LVG) Candidate retrieval – From the Metathesaurus Candidate evaluation Mapping construction

Parsing Text – Ocular complications of myasthenia gravis. Tagging – Ocular complications of myasthenia gravis. adj/2 noun prep noun noun/2 pd Simplified phrases – [mod(ocular), head(complications)] – [prep(of), head(myasthenia gravis), punc(.)]

Variant Generation Variants of adjective ocular (total 13, 9 occur in UMLS): – ocular{[adj], 0=[]} – eye{[noun], 2="s"} – eyes{[noun], 3="si"} – optic{[adj], 4="ss"} – ophthalmic{[adj], 4="ss"} – ophthalmia{[noun], 7="ssd"} – ophthalmias{[noun], 8="ssdi"} – ophthalmiac{[noun], 7="ssd"} – ophthalmiacs{[noun], 8="ssdi"} – oculus{[noun], 3="d"} – oculi{[noun], 4="di"} – ocularity{[noun], 3="d"} – ocularities{[noun], 4="di"}

Candidate Evaluation Phrase: Ocular complications Meta Candidates (8): 861 Complications (Complication) [patf] 861 complications (Complication Aspects) [patf] 777 Complicated [ftcn] 694 Ocular (Eye) [bpoc] 638 Eye (Entire Eye) [bpoc] 611 Optic (Optics) [ocdi] 611 Ophthalmic [spco] 588 Ophthalmia (Endophthalmitis) [dsyn]

Mapping Construction (WSD) Cerebral blood flow (CBF) in newborn infants is often below levels necessary to sustain brain viability in adults. Cerebrovascular Circulation CEREBRAL BLOOD FLOW IMAGING Adult Infant, Newborn Frequent Levels (qualifier value) Brain Viable Entire brain Sustained

MetaMap Options (3) Word Sense Disambiguation (WSD, -y ) – Based on Susanne Humphrey’s Journal Descriptor Indexing (Humphrey et al., 1998, 2006) – Provides modest improvement in results Negation ( --negex ) – Important for clinical text – Based on Wendy Chapman’s NegEx algorithm (Chapman et al., 2001) Behavior options Output/Display options

Behavior Options (1/4) Data model options -A --strict_model (the default; focused on concepts likely to be found in text) -C --relaxed_model (includes most Metathesaurus content) Major options highlighted earlier -y --word_sense_disambiguation --negex Other major options -Q --quick_composite_phrases (experimental, for well-behaved larger phrases: pain on the left side of the chest) -i --ignore_word_order

Behavior Options (2/4) Browse mode options (example below) -z --term_processing -o --allow_overmatches -g --allow_concept_gaps -m --hide_mappings Inference mode options (example below) -Y --prefer_multiple_concepts

Behavior Options (3/4) Parsing/lexical options (not often used) -t --no_tagging -d --no_derivational_variants -D --all_derivational_variants -a --all_acros_abbrs -u --unique_acros_abbrs_only List truncation options (reduces tenuous matches and saves processing time) -r --threshold

Behavior Options (4/4) Source/ST limitation options -R --restrict_to_sources -e --exclude_sources -J --restrict_to_sts -k --exclude_sts

Behavior Example “Superficial injury of chest wall without infection” – prefer multiple concepts: “Superficial”, “Injury”, “Chest”, “Wall”, “Infection” (time elapsed 0.39 sec) – quick composite phrases: Superficial”, “Injury of chest wall”, “Superficial injury of chest”, “Wall”, “Infection” (time elapsed 0.88 sec) – term processing: “Superficial injury of chest wall NOS, infected” (time elapsed 0.98 sec) – term processing, – relaxed model: “Superficial injury of chest wall without infection” (time elapsed 2.88 sec)

Output Formats (4) Human-readable output MetaMap Machine Output (MMO) XML output Colorized MetaMap output (MetaMap 3D) Fielded (MMI) Output

Output Formats: Human Readable Phrase: "heart attack" Meta Candidates (8): 1000 Heart attack (Myocardial Infarction) [Disease or Syndrome] 861 Heart [Body Part, Organ, or Organ Component] 861 Attack, NOS (Onset of illness) [Finding] 861 Attack (Attack device) [Medical Device] 861 attack (Attack behavior) [Social Behavior] 861 Heart (Entire heart) [Body Part, Organ, or Organ Component] 861 Attack (Observation of attack) [Finding] 827 Attacked (Assault) [Injury or Poisoning] Meta Mapping (1000): 1000 Heart attack (Myocardial Infarction) [Disease or Syndrome]

Output Formats: Machine Output Prolog terms (pretty-printed & condensed!) candidates([ ev(-1000, 'C ', 'Heart attack', 'Myocardial Infarction', [heart,attack], [dsyn], [[[1,2],[1,2],0]], yes, no, ['MEDLINEPLUS], [0/12]), ev(-861, 'C ', 'Heart', 'Heart', [heart], [bpoc], [[[1,1],[1,1],0]], yes, no, ['AIR'],[0/5]), ev(-861, 'C ', 'Attack, NOS', 'Onset of illness', [attack], [fndg], [[[2,2],[1,1],0]], yes, no, ['MTH'], [6/6]), ev(-861, 'C ', 'Attack', 'Attack device', [attack], [medd], [[[2,2],[1,1],0]], yes, no, ['MTH','MMSL'], [6/6]), ev(-861, 'C ', attack, 'Attack behavior', [attack], [socb], [[[2,2],[1,1],0]], yes, no, ['MTH','PSY','AOD'], [6/6]), ev(-861, 'C ', 'Heart', 'Entire heart', [heart], [bpoc], [[[1,1],[1,1],0]], yes, no, ['MTH','SNOMEDCT'], [0/5]), ev(-861, 'C ', 'Attack', 'Observation of attack', [attack], [fndg], [[[2,2],[1,1],0]], yes, no, ['MTH','SNOMEDCT'], [6/6]), ev(-827, 'C ', 'Attacked', 'Assault', [attacked], [inpo], [[[2,2],[1,1],1]], yes, no, ['ICD10AM'], [6/6])]).

Output Formats: Formatted XML C Heart attack Myocardial Infarction heart attack dsyn yes no MEDLINEPLUS 0 12

Output Formats: Colorized Output

MetaMap Fielded MMI Output |MM|430.78|Homocystine|C |[aapp,bacs]|["Homocystine"-ab-3- "Homocysteine","Homocystine"-ab-2-"homocysteine","Homocystine"-ab-1- "Homocysteine","Homocystine"-ti-1-"homocysteine"]|TI;AB|406:12|227:12|74:12|35: (PMID) MM (Path Name) (Score) Homocystine (UMLS Concept Found Preferred Name) C (UMLS Concept Unique Identifier) [aapp,bacs] (List of Semantic Type(s)) ["Homocystine"-ab-3-"Homocysteine","Homocystine"-ab-2- "homocysteine","Homocystine"-ab-1-"Homocysteine","Homocystine"-ti-1- "homocysteine"] (List of Entry Term Quartets) TI;AB (Location(s), boost scores for TI) 406:12|227:12|74:12|35:12 (List of Positional Information Groups [start:length])

Creators (5) National Library of Medicine (NIH): – Alan (Lan) R. Aronson: – Dina Demner-Fushman: – François-Michel Lang: – James G. Mork:

Conclusions Good for PubMed abstracts Not so good for Clarity progress notes