Large-Scale Evaluation of a Medical Cross- Language Information Retrieval System Kornél Markó 1,2, Philipp Daumke 1,2, Stefan Schulz 2, Rüdiger Klar 2,

Slides:



Advertisements
Similar presentations
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Advertisements

WP 10 Multilingual Access Philipp Daumke, Stefan Schulz.
SemanticMining WP20 meeting Freiburg, March 29 – 20, 2004.
Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.
Sheffield at ImageCLEF 2003 Paul Clough and Mark Sanderson.
HISA ltd. Biography proforma MEDINFO Lygon Street, Brunswick East 3057 Australia Presenter Name: Stefan Schulz Country:1. Germany, 2. Brazil Qualification(s):
The Challenges of Multilingual Search Paul Clough The Information School University of Sheffield ISKO UK conference 8-9 July 2013.
Multilingual experiments of CLEF 2003 Eija Airio, Heikki Keskustalo, Turid Hedlund, Ari Pirkola University of Tampere, Finland Department of Information.
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus HISA ltd. Biography proforma MEDINFO Lygon Street, Brunswick East.
Semantic Annotation for Multilingual Search Shibamouli Lahiri
Unsupervised Morpheme Analysis – Overview of Morpho Challenge 2007 in CLEF Mikko Kurimo, Mathias Creutz, Matti Varjokallio, Ville Turunen Helsinki University.
Presentation Title Presentation Subtitle and/or Conference Name Place Day Month Year First Name Last Name Job Title.
Automatic Lexicon Acquisition for a Medical Cross-Language Information Retrieval System Kornél Markó, Stefan Schulz, Udo Hahn Freiburg University Hospital,
Multilingual Access to Biomedical Documents Stefan Schulz, Philipp Daumke Institute of Medical Biometry and Medical Informatics University Medical Center.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Advance Information Retrieval Topics Hassan Bashiri.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Cross-Language Retrieval INST 734 Module 11 Doug Oard.
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.
Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,
1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September.
FishBase Summary Page about Salmo salar in the standard Language of FishBase (English) ENBI-WP-11: Multilingual Access to European Biodiversity Sites through.
PATENTSCOPE Patent Search Strategies and Techniques Andrew Czajkowski Head, Innovation and Technology Support Section Centurion September 11, 2014.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
Resolving Power of Search Keys in MedEval, a Swedish Medical Test Collection with User Groups: Doctors and Patients PhD thesis by Karin Friberg Heppin,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Automated coding of diagnoses - three methods compared Presenter.
The PATENTSCOPE search system: CLIR February 2013 Sandrine Ammann Marketing & Communications Officer.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Image Retrieval in Radiology: The ImageCLEF Challenge Charles E. Kahn, Jr. Medical College of Wisconsin Jayashree Kalpathy-Cramer Oregon Health & Science.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
IIIT Hyderabad’s CLIR experiments for FIRE-2008 Sethuramalingam S & Vasudeva Varma IIIT Hyderabad, India 1.
Comparing syntactic semantic patterns and passages in Interactive Cross Language Information Access (iCLEF at the University of Alicante) Borja Navarro,
Stefan Schulz, Kornél Markó, Philipp Daumke, Udo Hahn, Susanne Hanser, Percy Nohama, Roosewelt Leite de Andrade, Edson Pacheco, Martin Romacker Semantic.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
UA in ImageCLEF 2005 Maximiliano Saiz Noeda. Index System  Indexing  Retrieval Image category classification  Building  Use Experiments and results.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Clarity Cross-Lingual Document Retrieval, Categorisation and Navigation Based on Distributed Services
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Layered MorphoSaurus Lexicon Extension. Problem Confuse and arbitrary synonym classes of non-medical concepts High ambiguity of general (non- terminological)
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Detection of underspecifications in SNOMED CT concept definitions using language processing 1 Federal Technical University of Paraná (UTFPR), Curitiba,
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Assessing SNOMED CT for Large Scale eHealth Deployments in the EU Workpackage 2- Building new Evidence Daniel Karlsson, Linköping University Stefan Schulz,
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Multilingual Medical Lexicon
Chinese Academy of Sciences, Beijing, China
Experiments for the CL-SR task at CLEF 2006
Clinical NLP in North Germanic Languages
Multilingual Biomedical Dictionary
CLIR PATENTSCOPE search system
Token generation - stemming
Morphoogle - A Multilingual Interface to a Web Search Engine
Cross Language Information Retrieval (CLIR)
Presentation transcript:

Large-Scale Evaluation of a Medical Cross- Language Information Retrieval System Kornél Markó 1,2, Philipp Daumke 1,2, Stefan Schulz 2, Rüdiger Klar 2, Udo Hahn 3 1 Averbis GmbH, Freiburg, Germany 2 Medical Informatics Department, University Medical Center Freiburg, Germany 3 Jena University Language and Information Engineering Lab (JULIE), Germany

Medical Information Design? Accessibility? Bridging the gap between user groups and information sources? Discharge Summaries Pathology Reports, Scientific Publications, etc. Experts vs. Laymen Linguistic Morphology Cross-Lingual Physicians, Scientists Nurses, Laymen User Interface Heterogeneous User Groups Language Variability Heterogeneous Document Types Medical Information Retrieval

User Interface Risk factor high blood pressure Risk factor s hypertension Risk factor hypertension risk factor hypertension risk factors hypertension risk factor high blood pressure

User Interface Risk factor high blood pressure Risk factor s hypertension Risk factor hypertension

Multilingualism Korrelation von Hypertonie und Läsion der Weißen Substanz…

Multilingualism Korrelation von Hypertonie und Läsion der Weißen Substanz… “Correlation of high blood pressure and lesion of the white substance”

Multilingualism Korrelation von Hypertonie und Läsion der Weißen Substanz… “Correlation of high blood pressure and lesion of the white substance”

Korrelation von Hypertonie und Läsion der Weißen Substanz… “Correlation of high blood pressure and lesion of the white substance” Multilingualism

Linguistic phenomena adversely influence medical text retrieval ! – Inflection : leukocyte vs. leukocytes, appendix vs. appendices – Derivation : leukocyte, vs. leukocytic – Composition : leuk|em|ia, para|sympath|ectomy Magen|schleim|haut|entzünd|ung – Acronyms : AIDS, SARS, OECD – Orthographic Variants : Kolonkarzinom, Coloncarcinom, Esophagus, Oesophagus, – Synonyms : High blood pressure – Hypertension, Prophylaxis – Prevention Linguistic Morphology

MorphoSaurus Subword-based, multilingual semantic indexer for document retrieval Subwords are atomic, conceptual or linguistic units: –Stems: stomach, gastr, diaphys –Prefixes: anti-, bi-, hyper- –Suffixes: -ary, -ion, -itis –Infixes: -o-, -s- Equivalence classes contain synonymous subwords and their translations: – #derma = { derm, cutis, skin, haut, kutis, pele, cutis, piel, … } – #inflamm = { inflamm, -itic, -itis, -phlog, entzuend, -itis, -itisch, inflam, flog, inflam, flog,... }

Segmentation: Myo | kard | itis Herz | muskel | entzünd |ung Inflamm |ation of the heart muscle muscle myo muskel muscul inflamm -itis inflam entzünd Eq Class subword herz heart card corazon card INFLAMM MUSCLE HEART Morphosaurus Structure Indexation: #muscle #heart #inflamm #heart #muscle #inflamm #inflamm #heart #muscle Thesaurus: ~ equivalence classes (MIDs) Lexicon entries: –English:~ –German:~ –Portuguese:~ –Spanish:~ –French:~ –Swedish:~10.000

Morpho-Semantic Normalization

Disambiguation Maximum likelihood estimator Co-occurence information from large heterogeneous courpora #patient should be preferred over #patience, since „Patient“ is unambiguous in German and also co-occurs with #heart #america #heart #associat #advice #aspirin #utilis {#patient, #patience} #heart #attack. #german #life #adult #patient #heredit #heart #failure. The American Heart Association recommends aspirin use for any patient who had a heart attack. In Deutschland leben erwachsene Patienten mit angeborenen Herzfehlern.

Subword-based Search MorphoSaurus

Subword-Based Search Korrelation von Hypertonie und Läsion der Weißen Substanz… #correl #hyper #tens #lesion #whit #matter

Subword-Based Search Korrelation von Hypertonie und Läsion der Weißen Substanz… #correl #hyper #tens #lesion #whit #matter

Features TypeQueryFound: Others, Similarity Morpho- Saurus Typing Error: BrutskrebsBrustkrebs Speling Error: AppoplexApoplex Phonetic Search: SyfilisSyphilis Singular/Plural: RisikoRisiken Parts of Speech: leukozytärLeukozyt Inversions: Chronische BronchitisBronchitis, chronisch Word composition: BrustkrebsKarzinom des Brustdrüsengewebes --- Word decomposition: DarmkrebsrisikoreduzierungReduktion des Risikos von Darmkrebs --- Synonyms: SchlaganfallApoplex --- Abbreviations: WHOWeltgesundheitsorganisation --- Layman / Expert: DarmkrebsKolonkarzinom --- Multilingualism: Herzmuskelentzündung Inflammation of the heart muscle, myokarditis ---

Evaluation Gold standards: OHSUMED, ImageCLEFMed OHSUMED-Corpus (Hersh et al., 1994) –Subset of MEDLINE –~233,000 English documents –106 English user queries ImageCLEFMed Corpus (Clough et al., 2005) –Multilingual Image Retrieval Task 2006 –~ Medical Images and captions –30 queries Query-document pairs have been manually judged for relevance Non-English queries obtained by translation to German, Portuguese, Spanish and Swedish by domain experts Search Engine: Lucene –

Evaluation Baseline: monolingual text retrieval –(stemmed) English user queries –(stemmed) English texts Query translation (QTR) –Google translator –Multilingual dictionary compiled from UMLS Morphosaurus Indexing (MSI) –Interlingual representation of both user queries and documents –MSI-D incorporating disambiguation module

Results: Ohsumed

Results: ImageCLEFMed

Conclusions Cross-Language Document Retrieval –Based on morphological and semantic normalization of both user queries and documents –Matching of search/document terms on a language- independent, interlingual layer Language-independent indexing –reaches more than 92% of an English-English baseline on heterogeneous document collections, in average –outperforms query translation significantly –is independent from particular search engine architectures Morphosaurus incorporates six languages: –German, English, Portuguese, Spanish, French, Swedish In use in commercial systems

Contact Stefan Schulz Medical Informatics Department University Hospital Center Freiburg, Germany Kornél Markó Averbis GmbH, Freiburg, Germany