The NLM Indexing Initiative Alan R. Aronson, PhD Lister Hill Center, National Library of Medicine American Society of Indexers Annual Meeting May 15, 2004.

Slides:



Advertisements
Similar presentations
PubMed Overview From the main HINARI webpage, we can access PubMed by clicking on Search HINARI journal articles through PubMed (Medline). Note: If you.
Advertisements

We now view the Display Settings drop down menu for the Medline Format, 20 Items per Page and Recently Added Sort by options. To display click on the Apply.
Automatic Concept Indexing and Classification for Improved Retrieval in the Hazardous Substances Data Bank Doszkocs, Tamas; Chang, Hua Florence; Aronson,
PubMed Review Medical Library Association Annual Meeting May 20 – 22, 2007 Philadelphia.
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.
Semantic indexing in PubMed CERN Workshop on Innovations in Scholarly Communication (OAI8) CERN Workshop on Innovations in Scholarly Communication (OAI8)
Searching Pubmed Database استخدام قاعدة المعلومات Pubmed د. سيناء عبد المحسن العقيل قسم الصيدلة الإكلينيكية برنامج مهارات البحث العلمي.
Indexing the Biomedical Literature in a Time of Increased Demand and Limited Resources BioASQ Workshop September 27, 2013 Alan R. Aronson Lister Hill Center,
NLM Online Users’ Meeting May 21, 2012
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
U. S. National Library of Medicine NLM Indexing Initiative Tools for NLP: MetaMap and the Medical Text Indexer Natural Language Processing: State of the.
Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu.
NLM Medical Text Indexer (MTI) BioASQ Challenge Workshop September 27, 2013 J.G. Mork, A. Jimeno Yepes, A. R. Aronson.
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
U. S. National Library of Medicine Welcome to the first MMTx User’s Group Meeting AMIA 2003 November 11, 2003.
Social Pharmacy and Pharmacoepidemiology Lister Hill National Center for Biomedical Communications Text-based Discovery in Biomedicine The Architecture.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
HIKM’2006AMTEx Automatic Document Indexing in Large Medical Collections Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis Technical University.
HIKM’2006AMTEx Automatic Document Indexing in Large Medical Collections Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis Technical University.
Medical Subject Headings (MeSH)
PubMed/How to Search, Display, Download & (module 4.1)
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
How to do a literature search Saharuddin Ahmad Aida Jaffar Department of Family Medicine.
Betsy L. Humphreys, MLS Associate Director for Library Operations National Institutes of Health U.S. Department of Health and Human Services Working on.
Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2005 May 16 & 17, 2005 Rachel Kleinsorge.
NICTA Copyright 2013From imagination to impact Identifying Publication Types Using Machine Learning BioASQ Challenge Workshop A. Jimeno Yepes, J.G. Mork,
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA Experiences in visualizing and navigating biomedical.
Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.
PubMed/How to Search, Display, Download & (module 4.1)
Semi-Automatic Indexing of Full Text Biomedical Articles Washington D.C. October 25, 2005 Clifford W. Gay Lister Hill National Center for Biomedical Communications.
PubMed Overview From the HINARI Content page, we can access PubMed by clicking on Search inside HINARI full-text using PubMed. Note: If you do not properly.
PubMed/How to Search, Display, Download & (module 4.1)
Expert PubMed/Medline Searching Skills Konstantina (Dina) Matsoukas, MLIS Head of Reference & Education Coordinator CUMC - Health Sciences Library
Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.
Information overload –more than 12 million references already in MEDLINE –thousands more each day –well-articulated queries retrieve many relevant articles.
Lexical Tools Briefing The Lexical Systems Group NLMNLM. LHNCBC. CGSBLHNCBCCGSB June, 2006.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
The Gene Ontology and its insertion into UMLS Jane Lomax.
Finding Functional Gene Relationships Using the Semantic Gene Organizer (SGO) Kevin Heinrich Master’s Defense July 16, 2004.
Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
Introduction to Josephine Tan, MLIS Education and Information Consultant, Clinical Sciences
1 Semantic Relations for Interpreting DNA Microarray Data and for Novel Hypotheses Generation Dimitar Hristovski, 1 PhD, Andrej Kastrin, 2 Borut Peterlin,
U. S. National Library of Medicine The Current State of MetaMap and MMTx UMLS Webcast Alan (Lan) R. Aronson Lister Hill Center/NLM/NIH
Presenter : Audrey Thompson, Social Work Librarian, Howard University. Using Information Resources in Developing a Course Syllabus.
Automatic Assignment of Biomedical Categories: Toward a Generic Approach Patrick Ruch University Hospitals of Geneva, Medical Informatics Service, Geneva.
Medical Text Indexing Joe Thomas Unit Supervisor Index Section, NLM.
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Joined up ontologies: incorporating the Gene Ontology into the UMLS.
PubMed Searching: Automatic Term Mapping (ATM) PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center.
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
PubMed/How to Search, Display, Download & (module 4.1)
GUIDE. P UB M ED
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Using UMLS CUIs for WSD in the Biomedical Domain
Lívia Vasas, PhD 2018 The National Library of Medicine and its databases Mozilla Firefox/Google Chrome Lívia Vasas, PhD.
Citation-based Extraction of Core Contents from Biomedical Articles
PubMed.
Presentation transcript:

The NLM Indexing Initiative Alan R. Aronson, PhD Lister Hill Center, National Library of Medicine American Society of Indexers Annual Meeting May 15, 2004

Indexing Initiative (II) Project Goals Investigate automated and semi-automated indexing methodologies Develop methods that result in acceptable retrieval performance Concept-based algorithms Extensive use of UMLS resources

II Project Phases 1.Initially, an independent collection of projects addressing Indexing methods Evaluation Policy 2.Development of a prototype indexing system for testing indexing methods 3.Deployment of the Medical Text Indexer (MTI) system to NLM indexing environments

The Medical Text Indexer (MTI) Title + Abstract Ordered list of MeSH Terms MeSH Headings UMLS Concepts Postprocessing Restrict to MeSH Trigram Phrase Matching Rel. Cits. PubMed Related Citations Extract MeSH Phrasex MetaMap Phrases

MetaMap Indexing Title + Abstract Ordered list of MeSH Terms MeSH Headings UMLS Concepts Postprocessing Restrict to MeSH Trigram Phrase Matching Rel. Cits. PubMed Related Citations Extract MeSH Phrasex MetaMap Phrases

Trigram Phrase Matching Title + Abstract Ordered list of MeSH Terms MeSH Headings UMLS Concepts Postprocessing Restrict to MeSH Trigram Phrase Matching Rel. Cits. PubMed Related Citations Extract MeSH Phrasex MetaMap Phrases

PubMed Related Citations Title + Abstract Ordered list of MeSH Terms MeSH Headings UMLS Concepts Postprocessing Restrict to MeSH Trigram Phrase Matching Rel. Cits. PubMed Related Citations Extract MeSH Phrasex MetaMap Phrases

Restrict to MeSH Title + Abstract Ordered list of MeSH Terms MeSH Headings UMLS Concepts Postprocessing Restrict to MeSH Trigram Phrase Matching Rel. Cits. PubMed Related Citations Extract MeSH Phrasex MetaMap Phrases

Postprocessing Title + Abstract Ordered list of MeSH Terms MeSH Headings UMLS Concepts Postprocessing Restrict to MeSH Trigram Phrase Matching Rel. Cits. PubMed Related Citations Extract MeSH Phrasex MetaMap Phrases

Phrase-based Indexing Methods MetaMap Indexing Perform MetaMap processing on input text Parse text into phrases Generate variants Retrieve Metathesaurus candidates Evaluate the candidates Construct final mapping Rank all concepts discovered Trigram phrase matching Form phrases based on character trigrams Match against Metathesaurus

MetaMap Example Text: “The local anesthetic bupivacaine is cardiotoxic …” Phrases: “The local anesthetic bupivacaine”, “is”, “cardiotoxic”, … Variants: anesthetics, anaesthetic, anesthesia, … Candidates: ‘Bupivacaine’, ‘Local anaesthetic’, ‘Local anaesthetic, NOS’, … Mappings ‘Bupivacaine’ and ‘Local anaesthetic’ or ‘Local anaesthetic, NOS’

PubMed Related Citations Indexing Find the closest neighbors (related citations) to the input text Extract the MeSH headings from the neighbors Example Text: “Bupivacaine inhibition of L-type calcium current in ventricular cardiomyocytes of hamster. …” Extracted MeSH: ‘Calcium Channels’ ‘Calcium Channel Blockers’

Restrict to MeSH Find the semantically closest MeSH headings using UMLS relationships: Synonyms Associated expressions Hierarchical relationships (child, parent) Other relationships ‘Acute adenoviral follicular conjunctivitis’ restricts to ‘Adenoviridae Infections’ and ‘Conjunctivitis, Viral’

Postprocessing (1 of 2) Clustering of results from basic methods Indexing rules and lookup lists ‘Eclampsia’ -> ‘Female’ and ‘Pregnancy’ ‘Hamsters’ -> ‘Animal’ G05 treecode -> ‘genetics’ “pediatric(s)” -> ‘Child’ Exclusions (e.g., ‘TEST’, ‘Disease’) Further promotion of title headings and chemicals

Postprocessing (2 of 2) UMLS/MeSH heuristics Remove MM heading with unrelated semantic type Remove RC heading if no more general MM heading Remove a chemical MM heading when no other terms are chemical in nature MM – MetaMap recommendation RC – Related Citations recommendation

A MEDLINE Citation TI - Bupivacaine inhibition of L-type calcium current in ventricular cardiomyocytes of hamster. AB - BACKGROUND: The local anesthetic bupivacaine is cardiotoxic when accidentally injected into the circulation. Such cardiotoxicity might involve an inhibition of cardiac L- type Ca2+ current (ICa,L). This study was designed to define the mechanism of bupivacaine inhibition of ICa,L. … CONCLUSIONS: The inhibition of ICa,L appears, in part, to result from bupivacaine predisposing L-type Ca channels to the inactivated state. Data from washout suggest that there may be two mechanisms of inhibition at work. Bupivacaine may bind with low affinity to the Ca channel and also affect an unidentified metabolic component that modulates Ca channel function.

Assigned MeSH and Suggested MTI Terms Assigned MeSH (10) *Anesthetics, Local Animal *Bupivacaine *Calcium Channels Calcium Channels, L-Type Dose-Response Relationship, Drug Hamsters *Heart Male Support, Non-U.S. Gov’t Suggested MTI Terms (11) 1.Calcium 2.Heart Ventricle 3.Bupivacaine 4.Calcium Channels 5.Calcium Channel Blockers 6.Calcium Channels, L-Type 7.Cells 8.Calcium Channels, T-Type 9.Anesthetics, Local Hamsters Animal

MTI Deployment: Fully Automated Indexing MTI indexing of collections which will not be manually indexed deployed September 2002 Meeting abstracts collections available from the NLM Gateway HIV/AIDS: International Conference on AIDS Health services research: AcademyHealth and its predecessors Space life sciences: American Society for Gravitational and Space Biology (ASGSB) bulletin …

Evaluation: Fully Automated Indexing Retrieval experiments together with Continued system development to improve accuracy Incorporation of feedback Basic MTI components Word Sense Disambiguation (WSD) research

MTI Deployment: Semi-automated Indexing MTI recommendations presented to indexers within the Data Creation and Maintenance System (DCMS) deployed August 2002 after experiment MTI indexing (as of March 2004): ~1.5M MEDLINE citations processed accessed for ~28% of MEDLINE articles average daily accesses: ~600

MTI Indexing Experiment Ten volunteers each indexed a journal issue using MTI recommendations Questionnaires for each article indexed plus summary questionnaire Analysis Average of 8 useful terms per article (3 main) Precision =.29, Recall =.55 Adequate coverage? 37% yes, 53% partial, 10% no

Experiment Feedback Make suggested terms hot links to the MeSH browser Gray out selected terms Show entry term, not heading, if found Provide interactive access to MTI

Evaluation: Semi-Automated Indexing Comparison of final indexing with MTI suggestions Further feedback after implementation of indexers recommendations Evaluation contract (in planning)

Status of MTI Current research Word sense disambiguation (WSD) Extension to the full text of articles Future efforts Evaluation contract Possible use of MTI to review indexing

Indexing Initiative Contributors LHNCBC Alan R. Aronson Olivier Bodenreider Clifford W. Gay William T. Hole Susanne M. Humphrey James G. Mork Alexa T. McCray Thomas C. Rindflesch Will J. Rogers Sonya E. Shooshan NCBI Won Kim W. John Wilbur OCCS John Butler John M. Rozier LO Ione Auston Nadine Benton Andrea Demsey Lou S. Knecht James R. Marcetich Stuart J. Nelson Marina P. Rappoport Jane L. Rosov Catherine R. Selden Sara J. Tybaert Joe D. Thomas Carolyn B. Tilley Janice M. Ward SIS H. Florence Chang Tamas E. Doszkocs George (Mike) F. Hazard