U. S. National Library of Medicine NLM Indexing Initiative Tools for NLP: MetaMap and the Medical Text Indexer Natural Language Processing: State of the.

Slides:



Advertisements
Similar presentations
Automatic Concept Indexing and Classification for Improved Retrieval in the Hazardous Substances Data Bank Doszkocs, Tamas; Chang, Hua Florence; Aronson,
Advertisements

PubMed Tutorial Author: Gökhan Alpaslan DMD,Ph.D e-vident.
PubMed: Outline Coverage MeSH, mapping and subheadings Simple search Limits Displaying and managing results MeSH database Single citation matcher.
What is the status of community acquired pneumonia in adults in the United States? Searching PubMed pubmed.gov.
Introduction to PubMed® (pubmed.gov)
The NLM Indexing Initiative Alan R. Aronson, PhD Lister Hill Center, National Library of Medicine American Society of Indexers Annual Meeting May 15, 2004.
Semantic indexing in PubMed CERN Workshop on Innovations in Scholarly Communication (OAI8) CERN Workshop on Innovations in Scholarly Communication (OAI8)
Searching Pubmed Database استخدام قاعدة المعلومات Pubmed د. سيناء عبد المحسن العقيل قسم الصيدلة الإكلينيكية برنامج مهارات البحث العلمي.
Indexing the Biomedical Literature in a Time of Increased Demand and Limited Resources BioASQ Workshop September 27, 2013 Alan R. Aronson Lister Hill Center,
NLM Online Users’ Meeting May 21, 2012
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu.
NLM Medical Text Indexer (MTI) BioASQ Challenge Workshop September 27, 2013 J.G. Mork, A. Jimeno Yepes, A. R. Aronson.
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
U. S. National Library of Medicine Welcome to the first MMTx User’s Group Meeting AMIA 2003 November 11, 2003.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Literature Searching: Theories Related to Nursing Care of the Adult Min-Lin Fang, MLIS Education and Information Consultant for Nursing and Social and.
Arpita Bose, MLIS Outreach and Communications Coordinator
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
HIKM’2006AMTEx Automatic Document Indexing in Large Medical Collections Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis Technical University.
HIKM’2006AMTEx Automatic Document Indexing in Large Medical Collections Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis Technical University.
Medical Subject Headings (MeSH)
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2007 National Library of Medicine National Institutes of Health U.S. Dept. of Health.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of.
Citation Biomedical Informatics Data ➜ Information ➜ Knowledge BMI Biomedical Named Entity Recognition Ramakanth Kavuluru NLP Seminar – 8/21/2012.
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
How to do a literature search Saharuddin Ahmad Aida Jaffar Department of Family Medicine.
Text Categorization By Susanne M. Humphrey Lexical Systems Group National Library of Medicine
A Report to the Board of Scientific Counselors
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2005 May 16 & 17, 2005 Rachel Kleinsorge.
NICTA Copyright 2013From imagination to impact Identifying Publication Types Using Machine Learning BioASQ Challenge Workshop A. Jimeno Yepes, J.G. Mork,
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
Betsy L. Humphreys Betsy L. Humphreys Associate Director for Library Operations NLM, NIH, HHS NLM, NIH, HHS National Library.
Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.
Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications.
1 st June 2006 St. George’s University of LondonSlide 1 Using UMLS to map from a Library to a Clinical Classification: Improving the Functionality of a.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
MEDLINE for Medical Research Juliet Ralph and César Pimenta Hilary Term 2007.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Lexical Tools Briefing The Lexical Systems Group NLMNLM. LHNCBC. CGSBLHNCBCCGSB June, 2006.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
UMLS Unified Medical Language System. What is UMLS? A Unified knowledge representation system Project of NLM Large scale Distributed First launched in.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
MetaMap/MTI Web API. National Library of Medicine · National Institutes of Health · Department of Health and Human Services MetaMap/MTI Web API MetaMap.
U. S. National Library of Medicine The Current State of MetaMap and MMTx UMLS Webcast Alan (Lan) R. Aronson Lister Hill Center/NLM/NIH
Medical Text Indexing Joe Thomas Unit Supervisor Index Section, NLM.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Joined up ontologies: incorporating the Gene Ontology into the UMLS.
MetaMap UMLS Concept Mapping Program Pawel Matykiewicz and Others.
PubMed Searching: Automatic Term Mapping (ATM) PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center.
PubMed Basics Barbara A. Wood, MLIS Calder Library University of Miami Miller School of Medicine.
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
GUIDE. P UB M ED
MetaCoDe A GATE PLUGIN FOR TAGGING MEDICAL CORPORA IN FRENCH WITH CONTROLED TERMINOLOGIES Thierry Delbecque Pierre.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Information Retrieval and Web Search
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Information Retrieval and Web Search
Using UMLS CUIs for WSD in the Biomedical Domain
CSE 635 Multimedia Information Retrieval
Lívia Vasas, PhD 2018 The Nation Library of Medicine and its databases Mozilla Firefox or Google Chrome Lívia Vasas, PhD.
PubMed.
Presentation transcript:

U. S. National Library of Medicine NLM Indexing Initiative Tools for NLP: MetaMap and the Medical Text Indexer Natural Language Processing: State of the Art, Future Directions April 23, 2012 Alan R. Aronson

U. S. National Library of Medicine Outline Introduction MetaMap Overview Linguistic roots Recent Word Sense Disambiguation (WSD) efforts The NLM Medical Text Indexer (MTI) Overview MTI as First-line Indexer (MTIFL) Recent improvements Gene indexing 2

U. S. National Library of Medicine MetaMap/MTI Example MetaMap identifies biomedical concepts in text Medical Text Indexer (MTI) summarizes text using MetaMap and the Medical Subject Headings (MeSH) vocabulary 3

U. S. National Library of Medicine Outline Introduction MetaMap Overview Linguistic roots Recent Word Sense Disambiguation (WSD) efforts The NLM Medical Text Indexer (MTI) Overview MTI as First-line Indexer (MTIFL) Recent improvements Gene indexing 4

U. S. National Library of Medicine MetaMap Overview Named-entity recognition program Identify UMLS Metathesaurus concepts in text Linguistic rigor Flexible partial matching Emphasis on thoroughness rather than speed 5

U. S. National Library of Medicine The MetaMap Algorithm Parsing Using SPECIALIST minimal commitment parser, SPECIALIST lexicon, MedPost part of speech tagger Variant generation Using SPECIALIST lexicon, Lexical Variant Generation (LVG) Candidate retrieval From the Metathesaurus Candidate evaluation Mapping construction 6

U. S. National Library of Medicine MetaMap Evaluation Function Weighted average of centrality (is the head involved?) variation (average of all variation) coverage (how much of the text is matched?) cohesiveness (in how many pieces?) 7

U. S. National Library of Medicine MetaMap Processing Example Inferior vena caval stent filter (PMID ) Candidate Concepts: 909 C : Inferior Vena Cava Filter [medd] 804 C : Filter [mnob] 804 C : Filter [medd] 804 C : Filter [inpr] 804 C : Filter [cnce] 804 C : Filter [medd] 804 C : FILTER [medd] 717 C : Inferior vena caval [blor] 673 C : Vena caval [bpoc] 637 C : Stent [medd] 637 C : Stent [medd] 637 C : Vena [bpoc] C : Filters [mnob] C : Optical filter [medd] C : filter information process [inpr] C : Filter (function) [cnce] C : Filter Device Component [medd] C : Filter - medical device [medd] C : Stent, device [medd] C : Stent Device Component [medd] MetaMap Score (≤ 1000) Metathesaurus Concept Unique Identifier (CUI) Metathesaurus String UMLS Semantic Type 8

U. S. National Library of Medicine MetaMap Final Mappings Inferior vena caval stent filter Final Mappings (subsets of candidate sets): Meta Mapping (911) 909 C : Inferior Vena Cava Filter [medd] 637 C : Stent [medd] Meta Mapping (911): 909 C : Inferior Vena Cava Filter [medd] 637 C : Stent [medd] 9

U. S. National Library of Medicine Word Sense Disambiguation (WSD) Kids with colds may also have a sore throat, cough, headache, mild fever, fatigue, muscle aches, and loss of appetite. Candidate MetaMap mappings for cold C : Cold (Cold sensation) C : Cold (Cold temperature) C : Cold (Common cold) 10

U. S. National Library of Medicine Knowledge-based WSD Compare UMLS candidate concept profile vectors to context of ambiguous word Concept profile vectors’ words from definition, synonyms and related concepts Candidate concept with highest similarity is predicted Common coldCold temperature WeightWordWeightWord 265infect258temperature 126disease86hypothermia 41fever72effect 40cough48hot 11

U. S. National Library of Medicine Knowledge-based WSD Kids with colds may also have a sore throat, cough, headache, mild fever, fatigue, muscle aches, and loss of appetite. Common coldCold temperature WeightWordWeightWord 265infect258temperature 126disease86hypothermia 41fever72effect 40cough48hot 12

U. S. National Library of Medicine cold temperature common cold Automatically Extracted Corpus WSD MEDLINE contains numerous examples of ambiguous words context, though not disambiguated cold common cold CUI:C Candidate concept Unambiguous synonyms cold temperature Query CUI:C "common cold"[tiab] OR "acute nasopharyngitis"[tiab] … "cold temperature"[tiab] OR "low temperature"[tiab] … PubMed 13

U. S. National Library of Medicine WSD Method Results Corpus method has better accuracy than UMLS method MSH WSD data set created using MeSH indexing 203 ambiguous words 81 semantic types 37,888 ambiguity cases Indirect evaluation with summarization and MTI correlates with direct evaluation UMLSCorpus NLM WSD MSH WSD

U. S. National Library of Medicine Outline Introduction MetaMap Overview Linguistic roots Recent Word Sense Disambiguation (WSD) efforts The NLM Medical Text Indexer (MTI) Overview MTI as First-line Indexer (MTIFL) Recent improvements Gene indexing 15

U. S. National Library of Medicine MEDLINE Citation Example 16

U. S. National Library of Medicine MTI MetaMap Indexing – Actually found in text Restrict to MeSH – Maps UMLS Concepts to MeSH PubMed Related Citations – Not necessarily found in text Received 2,330 Indexer Feedbacks Incorporated 40% into MTI March 20, 2012 Hibernation should only be indexed for animals, not for "stem cell hibernation" Clove (spice) should not be mapped to the verb "cleave" Received 2,330 Indexer Feedbacks Incorporated 40% into MTI March 20, 2012 Hibernation should only be indexed for animals, not for "stem cell hibernation" Clove (spice) should not be mapped to the verb "cleave" 17

U. S. National Library of Medicine MTI Uses Assisted indexing of MEDLINE by Index Section Assisted indexing of Cataloging and History of Medicine Division records Automatic indexing of NLM Gateway meeting abstracts First-line indexing (MTIFL) since February

U. S. National Library of Medicine MTI as First-Line Indexer (MTIFL) MTI Processes/ Recommends MeSH Indexing Displays in PubMed as Usual Reviser Reviews Selects Adjusts Approves Indexer Reviews Selects MTI Processes/ Recommends MeSH Indexer Reviews Selects Reviser Reviews Selects Adjusts Approves Indexing Displays in PubMed as Usual “Normal” MTI Processing 19

U. S. National Library of Medicine MTI as First-Line Indexer (MTIFL) MTI Processes/ Indexes MeSH Indexing Displays in PubMed as Usual Index Section Compares MTI and Reviser Indexing Reviser Reviews Selects Adjusts Approves 23 MEDLINE Journals Indexer Reviews Selects MTI Processes/ Indexes MeSH Reviser Reviews Selects Adjusts Approves Indexing Displays in PubMed as Usual MTIFL MTI Processing MEDLINE Journals

U. S. National Library of Medicine CheckTags Machine Learning Results CheckTagF 1 before MLF 1 with MLImprovement Middle Aged1.01%59.50% Aged11.72%54.67% Child, Preschool6.11%45.40% Adult19.49%56.84% Male38.47%71.14% Aged, 80 and over1.50%30.89% Young Adult2.83%31.63% Female46.06%73.84% Adolescent24.75%42.36% Humans79.98%91.33% Infant34.39%44.69% Swine71.04%74.75% k citations for training and 100k citations for testing 21

U. S. National Library of Medicine CheckTags Machine Learning Results CheckTagF 1 before MLF 1 with MLImprovement Middle Aged1.01%59.50% Aged11.72%54.67% Child, Preschool6.11%45.40% Adult19.49%56.84% Male38.47%71.14% Aged, 80 and over1.50%30.89% Young Adult2.83%31.63% Female46.06%73.84% Adolescent24.75%42.36% Humans79.98%91.33% Infant34.39%44.69% Swine71.04%74.75% k citations for training and 100k citations for testing 22

U. S. National Library of Medicine CheckTags Machine Learning Results CheckTagF 1 before MLF 1 with MLImprovement Middle Aged1.01%59.50% Aged11.72%54.67% Child, Preschool6.11%45.40% Adult19.49%56.84% Male38.47%71.14% Aged, 80 and over1.50%30.89% Young Adult2.83%31.63% Female46.06%73.84% Adolescent24.75%42.36% Humans79.98%91.33% Infant34.39%44.69% Swine71.04%74.75% k citations for training and 100k citations for testing 23

U. S. National Library of Medicine MTI - How are we doing? Focus on Precision versus Recall Fruition of 2011 Changes 24

U. S. National Library of Medicine 25

U. S. National Library of Medicine The Gene Indexing Assistant (GIA) An automated tool to assist the indexer in identifying and creating GeneRIFs Evaluate the article Identify genes Make links to Entrez Gene Suggest geneRIF annotation Anticipated Benefits: Increase in speed Increase in comprehensiveness 26

U. S. National Library of Medicine The NLM Indexing Initiative Team Alan R. Aronson (Project Leader) James G. Mork (Staff) François-Michel Lang (Staff) Willie J. Rogers (Staff) Antonio J. Jimeno-Yepes (Postdoctoral Fellow) J. Caitlin Sticco (Library Associate Fellow) 27