An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory www.intelligence.tuc.gr.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Chapter 5: Introduction to Information Retrieval
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
©2013 MFMER | slide-1 An Incremental Approach to MEDLINE MeSH Indexing Presenter: Hongfang Liu BioASQ 2013 Team Member: Mayo Clinic: Wu Stephen, James.
PubMed: Outline Coverage MeSH, mapping and subheadings Simple search Limits Displaying and managing results MeSH database Single citation matcher.
Searching for Medicines Information New Zealand College of Pharmacists.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
U. S. National Library of Medicine NLM Indexing Initiative Tools for NLP: MetaMap and the Medical Text Indexer Natural Language Processing: State of the.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Automatic Document Categorisation by User Profile in MEDLINE Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory
Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights Kwangcheol Shin, Sang-Yong Han School of CSE, Chung-Ang Univ. Seoul, Korea NLDB.
June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
Arpita Bose, MLIS Outreach and Communications Coordinator
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
HIKM’2006AMTEx Automatic Document Indexing in Large Medical Collections Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis Technical University.
HIKM’2006AMTEx Automatic Document Indexing in Large Medical Collections Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis Technical University.
Medical Subject Headings (MeSH)
DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2007 National Library of Medicine National Institutes of Health U.S. Dept. of Health.
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
How to do a literature search Saharuddin Ahmad Aida Jaffar Department of Family Medicine.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
1 st June 2006 St. George’s University of LondonSlide 1 Using UMLS to map from a Library to a Clinical Classification: Improving the Functionality of a.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
The PubMed ® Game Designed for librarians & library staff From PubMed for Experts Brought to you by NN/LM Pacific Southwest Region February 2013 rev 5.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida
Retrieval 1/2 BDK12-5 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Nomenclature Definition Nomenclature is systematic compilation of authorized terms or descriptors for a certain documentation task. Nomenclatures usually.
The role of knowledge in conceptual retrieval: a study in the domain of clinical medicine Jimmy Lin and Dina Demner-Fushman University of Maryland SIGIR.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Graduate School of Informatics Kyoto University, November 21, 2001 Technologies of the Interspace Peer-Peer Semantic Indexing Bruce Schatz CANIS Laboratory.
Reference Collections: Collection Characteristics.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Automatic Assignment of Biomedical Categories: Toward a Generic Approach Patrick Ruch University Hospitals of Geneva, Medical Informatics Service, Geneva.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.
GUIDE. P UB M ED
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
NeurOn: Modeling Ontology for Neurosurgery
An Automatic Construction of Arabic Similarity Thesaurus
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
PubMed.
MedSearch is a retrieval system for the medical literature
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Introduction to Search Engines
Presentation transcript:

An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory Technical University of Crete (TUC) Chania, Crete, Greece

Problem Definition Medical information systems are designed for experts ! –Use complex terms in their searches –Domain specific answers Must also serve naive consumers –Do simple searches using natural language terms –Easy to read and comprehend information Investigate methods for the categorization of information by user profile BIBE 2012, Larnaca, Cyprus2

3 Current Practices MedScape, Medlineplus, MedHunt rely on the manual translation and categorization of information for consumers –Slow, does not scale-up for large collections In MEDLINE of U.S. NLM, documents are indexed by experts and for experts only –No categorization by user user profile –10-12 MeSH terms per document (pathology, disease, treatment, drugs etc) –Over 15 million documents - Slow !! –Need to automate this process

BIBE 2012, Larnaca, Cyprus4 Objectives Investigate methods for automatic document indexing in MEDLINE These index terms are subsequently used for filtering documents by user profile Main Idea: categorization of terms to simple terms comprehendible by consumers or more involved terms suitable for experts

Resources Automatic indexing in MEDLINE: –MMTx [U.S. NLM]: MMTx focus on UMLS rather than MeSH –AMTEx [DKE, 2009]: MeSH terms, faster and more accurate than MMTx Dictionaries for biomedical and health related concepts –UMLS Metathesaurus, MeSH Dictionaries for general English words –WordNet, Specialist BIBE 2012, Larnaca, Cyprus5

MMTx (MetaMap Transfer) Developed by U.S. NLM Maps text to UMLS Metathesaurus concepts –but MEDLINE indexing is based on MeSH –MeSH is a subset of Metathesaurus Suffers from term overgeneration Unrelated terms added to the final candidate list The list must be cleaned-up to keep only MeSH terms Topic drift BIBE 2012, Larnaca, Cyprus6

The AMTEx method [DKE 2009] Main idea: Initial term extraction based on a hybrid linguistic/statistical approach, the C/NC value Extracts general single and multi-word terms (noun phrases) Mainly multi-word terms: “heart disease”, “coronary artery disease” Extracted terms are validated against MeSH Faster, improved precision by merely a fifth of term output of MMTx BIBE 2012, Larnaca, Cyprus7

Example BIBE 2012, Larnaca, Cyprus Input: Full text article MEDLINE index terms: “Aged”, “Data Collection”, “Humans”,“Knee”, “Middle Aged”, “Osteoarthritis, Knee/complications”, “Osteoarthritis, Knee/diagnosis”, “Pain/classification”, “Pain/etiology”, “Prospective Studies”, “Research Support, Non-U.S. Gov’t” MMTx terms: “osteoarthritis knee”, “retention”, “peat”, “rheumatology”, “acetylcholine”, “lysine acetate”, “potassium acetate”, “questionnaires”, “target population”, “population”, “selection bias”, “creativeness”, “reproduction”, “cohort studies”, “europe”, “couples”, “naloxone”, “sample size”, “arthritis”, “data collection”, “mail” ‘health status”, “respondents”, “ontario”, “universities”, “dna”, “baseline survey”, “medical records”, “informatics”, “general practitioners”, “gender”, “beliefs”, “logistic regression”, “female”, “marital status”, “employment status”, “comprehension”, “surveys”, “age distribution”, “manual”, “occupations”, “manuals”, “persons”, “females”, “minor”, “minority groups”, “incentives”, “business”, “ability”, “comparative study”, “odds ratio”, “biomedical research”, “pubmed”, “copyright”, “coding”, “longitudinal studies”, “immunoelectrophoresis”, “skin diseases”, “government”, “norepinephrine”, “social sciences”, “survey methods”, “tyrosine”, “new zealand”, “azauridine”, “gold”, “nonrespondents”, “cycloheximide”, “rheum”, “jordan”, “cadmium”, “radiopharmaceuticals”, “community”, “disease progression”, “history” AMTEx terms: “health surveys”, “pain”, “review publication type”, “data collection”, “osteoarthritis knee”, “knee”, “science”, “health services needs and demand”, “population”, “research”, “questionnaires”, “informatics”, “health” 8

Term & Document Categorization BIBE 2012, Larnaca, Cyprus9

New Vocabularies Vocabulary of General Terms (VGT): general (WordNet) terms Vocabulary of Consumer Terms (VCT): 7,165 consumer (MeSH) terms. Vocabulary of Expert Terms (VET): 16,719 consumer (MeSH) terms BIBE 2012, Larnaca, Cyprus10

Document Categorization Documents are represented by vectors of terms extracted by AMTEx, MMTx or assigned by human experts The more VET (VCT) terms a document contains the higher its probability to be suitable for experts (consumers) –E.g., a document with VET% = 0.62 has 62% probability to be one suitable for experts BIBE 2012, Larnaca, Cyprus11

Evaluation Precision and Recall measures: a good method has high values of both Datasets: OHSUMED: 348,566 MEDLINE abstracts that come with 64 queries and their relevant answers Ground truth: the set of MeSH index terms assigned to documents by experts BIBE 2012, Larnaca, Cyprus12

AMTEx vs MMTx BIBE 2012, Larnaca, Cyprus13 AMTEx: faster, improved precision by merely a fifth of term output of MMTx Data SetMethod Number of Terms PrecisionRecall Time (hours) OHSUMED AMTE X MMT X PMC AMTE X MMT X

Categorization by User Profile How good is the method in retrieving answers for consumers and experts ? We run retrievals for consumers & experts –15 out of the 64 queries contain no expert terms and are suitable for consumers –The remaining queries are suitable for experts –Documents are represented by document vectors of MeSH, MMTx, or AMTEx terms –The retrieval method is Vector Space Model –The document similarity score of VSM is multiplied by its respective VET or VCT score BIBE 2012, Larnaca, Cyprus14

Consumers Retrieval Task BIBE 2012, Larnaca, Cyprus15

Experts Retrieval Task BIBE 2012, Larnaca, Cyprus16

Results Analysis The results indicate –A tendency of human experts to assign simple terms to documents and –Selective ability of AMTEx in extracting complex terms suitable for experts BIBE 2012, Larnaca, Cyprus17

Conclusions & Future Work We investigate methods: –Automatic document indexing –Categorization by user profile AMTEx is well suited for both problems Future work: more elaborate document categorization methods (machine learning, fuzzy) More term and document categories –According to UMLS SN (pathology, treatment) –User categories (e.g., specialty) BIBE 2012, Larnaca, Cyprus18

BIBE 2012, Larnaca, Cyprus19 Questions and answers

ΑΜΤΕx Outline BIBE 2012, Larnaca, Cyprus INPUT: Document Collection INPUT: Document Collection C/NC value Multi-word Term Extraction & Term Ranking C/NC value Multi-word Term Extraction & Term Ranking MeSH Term Validation MeSH Term Validation Single-word Term Extraction Non-MeSH multi-word are broken down & validated against MeSH Single-word Term Extraction Non-MeSH multi-word are broken down & validated against MeSH Variant Generation Term Expansion (MeSH) Term Expansion (MeSH) MeSH Thesaurus Resource MeSH Thesaurus Resource OUTPUT: MeSH Term Lists OUTPUT: MeSH Term Lists 20

MeSH: Medical Subject Headings The NLM medical & biological terms thesaurus: Organized in IS-A hierarchies –more than 15 taxonomies & more than 22,000 terms –a term may appear in multiple taxonomies No PART-OF relationships Terms organized into synonym sets called entry terms, including stemmed term forms BIBE 2012, Larnaca, Cyprus21

Fragment of the MeSH IS-A Hierarchy BIBE 2012, Larnaca, Cyprus neuralgia Root Nervous system diseases Neurologic manifestations pain headache Cranial nerve diseases Facial neuralgia 22