Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic indexing in PubMed CERN Workshop on Innovations in Scholarly Communication (OAI8) CERN Workshop on Innovations in Scholarly Communication (OAI8)

Similar presentations


Presentation on theme: "Semantic indexing in PubMed CERN Workshop on Innovations in Scholarly Communication (OAI8) CERN Workshop on Innovations in Scholarly Communication (OAI8)"— Presentation transcript:

1 Semantic indexing in PubMed CERN Workshop on Innovations in Scholarly Communication (OAI8) CERN Workshop on Innovations in Scholarly Communication (OAI8) Geneva, Switzerland June 20, 2013 Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA

2 Orientation u NLM is the world's largest biomedical library l Located in Bethesda, Maryland, near Washington, DC u PubMed provides access to MEDLINE, NLM’s bibliographic database of over 20M citations l MEDLINE covers 5600 journals and adds almost 1M new citations each year l PubMed is part of the Entrez system of the National Center for Biotechnology Information (NCBI) 2

3 3 Outline u Anatomy of a MEDLINE citation u Types of PubMed searches l Simple text search l Search based on MeSH indexing u Automatic indexing u Beyond topics

4 Anatomy of a MEDLINE citation 4 Title Abstract Indexing

5 5 MeSH main heading[/subheading(s)] [+ * for major topic]

6 Types of PubMed searches

7 Non-semantic search u PubMed does not require the use of MeSH for querying l Supports “Google-like” text searches n “no librarian required” l But can identify MeSH terms even if they are not labeled as such 7

8 Non-semantic search Example u Find articles about the cheese Gruyère l Gruyère 8

9 MeSH (semantic) search u Medical Subject Headings (MeSH) l Controlled vocabulary developed at NLM for indexing and retrieval of MEDLINE citations l ~26,000 descriptors (main headings) l <100 qualifiers (subheadings) l 214,000 supplementary concept records u Hierarchical structure (“tree numbers”) l Supports query expansion (“explosion”) n Search for a descriptor or any of its descendants 9

10 Simple MeSH search Example u Find articles about drug-induced psychoses l "Psychoses, Substance-Induced"[Mesh] 10

11 Search with “Explosion” u By default, PubMed retrieves articles indexed with a descriptor or any of its descendants  Use mesh:noexp to prevent “explosion” from happening 11

12 “Explosion” Example u Find articles about fluoroquinolones (or desc.) l "fluoroquinolones"[Mesh] 12

13 Search leveraging synonymy in MeSH u MeSH descriptors include related concepts (Entry terms) l Synonyms l Closely related (and clustered or indexing and retrieval purposes) u All terms from a descriptor and its entry terms are used for retrieval in PubMed 13

14 14

15 Entry terms for “Addison Disease” 15

16 Search leveraging UMLS Synonymy u Unified Medical Language System (UMLS) l Terminology integration system l ~130 biomedical terminologies l Synonymous terms clustered into concepts u UMLS synonymy used in PubMed l Query translation happens “behind the scenes” l E.g., search on “primary adrenocortical insufficiency” n Retrieves articles about “Addison’s disease” 16

17 17

18 No entry term for “Heart attack” 18

19 Query translation 19

20 Subheading restrictions u Subheadings represent the context of use of a particular descriptor l Ciprofloxacin/Adverse effects l Mood Disorders/Chemically induced u Assigned during indexing u Can be queried in PubMed 20

21 Subheading restrictions Example u Find articles about drugs involved in adverse events l "Chemicals and Drugs Category“/adverse effects[MeSH] 21

22 Recapitulative example u Find articles about drugs involved in adverse events and drug-induced manifestations l (("Chemicals and Drugs Category"[Mesh]) AND (adverse effects[sh] OR contraindications[sh] OR mortality[sh])) AND (chemically induced[sh] OR (("Drug-Induced Liver Injury"[Mesh:noexp]) OR ("Drug Eruptions"[Mesh:noexp]) OR ("Epidermal Necrolysis, Toxic"[Mesh]) OR ("Drug-Induced Liver Injury, Chronic"[Mesh]) OR ("Erythema Nodosum"[Mesh]) OR ("Serotonin Syndrome"[Mesh]) OR ("Hand-Foot Syndrome"[Mesh]) OR ("Neuroleptic Malignant Syndrome"[Mesh]) OR ("MPTP Poisoning"[Mesh]) OR ("Dyskinesia, Drug-Induced"[Mesh]) OR ("Neurotoxicity Syndromes"[Mesh:noexp]) OR ("Psychoses, Substance-Induced"[Mesh]) OR ("Akathisia, Drug- Induced"[Mesh]))) AND (medline[sb]) 22

23 Automatic indexing

24 Automatic indexing Motivation u Indexing by humans is costly and has limited reproducibility u Natural language processing can effectively support named entity recognition u Automatic indexing can produce l Suggestions for human indexers l Final indexing for some journals 24

25 Automatic indexing Principles u Hybrid approach l Concepts extracted from title and abstract n Mapped from UMLS to MeSH l MeSH descriptors extracted from related citations u Post-processing l Clustering and ranking l Integrate indexing rules n E.g., “rule of 3” –Index with a higher-level descriptor rather than with 3 or more lower-level descriptors 25

26 u Medical Text Indexer 26 Automatic indexing Workflow

27 Automatic indexing Applications u MEDLINE indexing l Support MEDLINE indexing at NLM n 3600 new citations processed every weeknight n Suggestions displayed in the indexing environment l “First-line” indexing n For 75 journals n MTI recommendations are used as an indexer n Simply reviewed by a senior indexer u Cataloging and History of Medicine l Assisted indexing 27

28 Beyond topics

29 Beyond concepts… relations u Also known as l Facts l Predications l Nano-publications l … u Relation extraction l Usually based on natural language processing (NLP) n E.g., SemRep l Relations stored in (subject, predicate, object) form n With provenance information 29

30 Experimental application Semantic MEDLINE u Multi-document summarization u Based on a database of 60M predications extracted from MEDLINE u Entities normalized to the UMLS Metathesaurus u Relations aligned with the UMLS Semantic Network u Interfaced with PubMed (for retrieving PMIDs) on a given topic l Forms the basis for summarization 30

31 31

32 32 Relation extraction Applications u Enhanced information retrieval l Indexing on relations in addition to concepts or association main heading/subheading u Multi-document summarization l Extract and visualize the facts extracted from 250 recent abstracts on the treatment of Parkinson’s disease u Question answering l Clinical and biological questions u Knowledge discovery l Connect facts from heterogeneous resources

33 Medical Ontology Research Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA


Download ppt "Semantic indexing in PubMed CERN Workshop on Innovations in Scholarly Communication (OAI8) CERN Workshop on Innovations in Scholarly Communication (OAI8)"

Similar presentations


Ads by Google