1 Multimedia I: Image Retrieval in Biomedicine William Hersh, MD Professor and Chair Department of Medical Informatics & Clinical Epidemiology Oregon Health.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Multimedia Database Systems

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Search Results Need to be Diverse Mark Sanderson University of Sheffield.

ImageCLEF breakout session Please help us to prepare ImageCLEF2010.

Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.

Evaluating Search Engine

How to Make Manual Conjunctive Normal Form Queries Work in Patent Search Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science.

Modern Information Retrieval

Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.

INFO 624 Week 3 Retrieval System Evaluation

Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

A Task Oriented Non- Interactive Evaluation Methodology for IR Systems By Jane Reid Alyssa Katz LIS 551 March 30, 2004.

Focus Group Methodology  Five focus groups science educators (n = 38)  K-5, 6-12 (inservice and preservice group), undergraduate faculty (two groups)

Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.

CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.

Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.

ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.

Evaluation Information retrieval Web. Purposes of Evaluation System Performance Evaluation efficiency of data structures and methods operational profile.

Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.

Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.

Is Relevance Associated with Successful Use of Information Retrieval Systems? William Hersh Professor and Head Division of Medical Informatics & Outcomes.

Retrieval 2/2 BDK12-6 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.

Evaluation David Kauchak cs458 Fall 2012 adapted from:

Evaluation David Kauchak cs160 Fall 2009 adapted from:

August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.

Search Engines and Information Retrieval Chapter 1.

Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,

1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.

Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.

Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?

IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.

Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals  Test collections: evaluating sets Test collections: evaluating rankings Interleaving.

Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.

The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield

TRECVID Evaluations Mei-Chen Yeh 05/25/2010. Introduction Text REtrieval Conference (TREC) – Organized by National Institute of Standards (NIST) – Support.

Image Retrieval in Radiology: The ImageCLEF Challenge Charles E. Kahn, Jr. Medical College of Wisconsin Jayashree Kalpathy-Cramer Oregon Health & Science.

1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)

MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.

Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.

1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.

How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.

The CLEF 2005 Cross-Language Image Retrieval Track Organised by Paul Clough, Henning Müller, Thomas Deselaers, Michael Grubinger, Thomas Lehmann, Jeffery.

Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.

Medical retrieval 2008 New data set with almost 66,000 images Thirty topics were made available, ten in each of three categories: visual, mixed, and semantic.

Performance Measurement. 2 Testing Environment.

Big Mechanism for Processing EEG Clinical Information on Big Data Aim 1: Automatically Recognize and Time-Align Events in EEG Signals Aim 2: Automatically.

Information Retrieval

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.

1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.

Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)

The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.

Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.

Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.

Crowdsourcing Blog Track Top News Judgments at TREC Richard McCreadie, Craig Macdonald, Iadh Ounis {richardm, craigm, 1.

Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,

Manual Query Modification and Automated Translation to Improve Cross-language Medical Image Retrieval Jeffery R. Jensen William R. Hersh Department of.

Evaluation of an Information System in an Information Seeking Process Lena Blomgren, Helena Vallo and Katriina Byström The Swedish School of Library and.

Information Retrieval in Practice

Evaluation Anisio Lacerda.

Walid Magdy Gareth Jones

William Hersh, MD Professor and Chair

IR Theory: Evaluation Methods

Retrieval Performance Evaluation - Measures

Presentation transcript:

1 Multimedia I: Image Retrieval in Biomedicine William Hersh, MD Professor and Chair Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University

2 Acknowledgements Funding –NSF Grant ITR Collaborators –Jeffery Jensen, Jayashree Kalpathy-Cramer, OHSU –Henning Müller, University of Geneva, Switzerland –Paul Clough, University of Sheffield, England –Cross-Language Evaluation Forum (Carol Peters, ISTI-CNR, Pisa, Italy)

3 Overview of talk Brief review of information retrieval evaluation Issues in indexing and retrieval of images ImageCLEF medical image retrieval project –Test collection description –Results and analysis of experiments Future directions

4 Image retrieval Biomedical professionals increasingly use images for research, clinical care, and education, yet we know very little about how they search for them Most image retrieval work has focused on either text annotation retrieval or image processing, but not combining both Goal of this work is to increase our understanding and ability to retrieve images

5 Image retrieval issues and challenges Image retrieval is a “poor stepchild” to text retrieval, with less understanding of how people use systems and how well they work Images are not always “standalone,” e.g., –May be part of a series of images –May be annotated with text Images are “large” –Relative to text Images may be compressed, which may results in loss of content (e.g., lossy compression)

6 Review of evaluation of IR systems System-oriented – how well system performs –Historically focused on relevance-based measures Recall – # relevant retrieved / # relevant in collection Precision – # relevant retrieved / # retrieved by search –When content output is ranked, can aggregate both in measure like mean average precision (MAP) User-oriented – how well user performs with system –e.g., performing task, user satisfaction, etc.

7 System-oriented IR evaluation Historically assessed with test collections, which consist of –Content – fixed yet realistic collections of documents, images, etc. –Topics – statements of information need that can be fashioned into queries entered into retrieval systems –Relevance judgments –by expert humans for which content items should be retrieved for which topics Calculate summary statistics for all topics –Primary measure usually MAP

8 Calculating MAP in a test collection 1: REL 2: NOT REL 3: REL 4: NOT REL 5: NOT REL 1/1 = 1.0 Average precision (AP) for a topic: 2/3 = 0.67 N: REL0 ( ) / 5 = 0.43 Mean average precision (MAP) is mean of average precision for all topics in a test collection Result is an aggregate measure but the number itself is only of comparative value 6: REL 7: NOT REL 3/6 = 0.5 N+1: REL0

9 Some well-known system- oriented evaluation forums Text Retrieval Conference (TREC, trec.nist.gov; Voorhees, 2005) –Many “tracks” of interest, such as Web searching, question- answering, cross-language retrieval, etc. –Non-medical, with exception of Genomics Track (Hersh, 2006) Cross-Language Evaluation Forum (CLEF, campaign.org) –Spawned from TREC cross-language track, European-based –One track on image retrieval (ImageCLEF), which includes medical image retrieval tasks (Hersh, 2006) Operate on annual cycle: Release of document/image collection Experimental runs and submission of results Relevance judgments Analysis of results

10 Image retrieval – indexing Two general approaches (Müller, 2004) –Textual or semantic – by annotation, e.g., Narrative description Controlled terminology assignment Other types of textual metadata, e.g., modality, location –Visual or content-based Identification of features, e.g., colors, texture, shape, segmentation Our ability to “understand” content of images less developed than for textual content

11 Image retrieval – searching Based on type of indexing –Textual – typically uses features of text retrieval systems, e.g., Boolean queries Natural language queries Forms for metadata –Visual – usual goal is to identify images with comparable features, i.e., find me images similar to this one

12 Example of visual image retrieval

13 ImageCLEF medical image retrieval Aims to simulate general searching over wide variety of medical images Uses standard IR approach with test collection consisting of –Content –Topics –Relevance judgments Has operated through three cycles of CLEF ( ) –First year used Casimage image collection –Second and third year used current image collection –Developed new topics and performed relevance judgments for each cycle Web site –

14 ImageCLEF medical collection library organization Library Collection Case Annotation Image Annotation Image Annotation

15 ImageCLEF medical test collection CollectionPredominant images CasesImagesAnnotationsSize CasimageMixed English – 177 French – GB Mallinckrodt Institute of Radiology (MIR) Nuclear medicine English – MB Pathology Education Instructional Resource (PEIR) Pathology32319 English – GB PathoPICPathology7805 German – 7805 English – MB

16 Example case from Casimage ID: 4272 Description: A large hypoechoic mass is seen in the spleen. CDFI reveals it to be hypovascular and distorts the intrasplenic blood vessels. This lesion is consistent with a metastatic lesion. Urinary obstruction is present on the right with pelvo- caliceal and uretreal dilatation secondary to a soft tissue lesion at the junction of the ureter and baldder. This is another secondary lesion of the malignant melanoma. Surprisingly, these lesions are not hypervascular on doppler nor on CT. Metastasis are also visible in the liver. Diagnosis: Metastasis of spleen and ureter, malignant melanoma Clinical Presentation: Workup in a patient with malignant melanoma. Intravenous pyelography showed no excretion of contrast on the right. Images Case annotation

17 Annotations vary widely Casimage – case and radiology reports MIR – image reports PEIR – metadata based on Health Information Assets Library (HEAL) PathoPIC – image descriptions, longer in German and shorter in English

18 Topics Each topic has –Text in 3 languages –Sample image(s) –Category – judged amenable to visual, mixed, or textual retrieval methods 2005 – 25 topics –11 visual, 11 mixed, 3 textual 2006 – 30 topics –10 each of visual, mixed, and textual

19 Example topic (2005, #20) Show me microscopic pathologies of cases with chronic myelogenous leukemia. Zeige mir mikroskopische Pathologiebilder von chronischer Leukämie. Montre-moi des images de la leucémie chronique myélogène.

20 Relevance judgments Done in usual IR manner with pooling of results from many searches on same topic Pool generation – top N results from each run –Where N = 40 (2005) or 30 (2006) –About 900 images per topic judged Judgment process –Judged by physicians in OHSU biomedical informatics program –Required about 3-4 hours per judge per topic Kappa measure of interjudge agreement = (“good”)

21 ImageCLEF medical retrieval task results – 2005 (Hersh, JAMIA, 2006) Each participating group submitted one or more runs, with ranked results from each of the 25 topics A variety of measures calculated for each topic and mean over all 25 –(Measures on next slide) Initial analysis focused on best results in different categories of runs

22 Measurement of results Retrieved Relevant retrieved Mean average precision (MAP, aggregate of ranked recall and precision) Precision at number of images retrieved (10, 30, 100) (And a few others…)

23 Categories of runs Query preparation –Automatic – no human modification –Manual – with human modification Query type –Textual – searching only via textual annotations –Visual – searching only by visual means –Mixed – textual and visual searching

24 Retrieval task results Best results overall Best results by query type Comparison by topic type Comparison by query type Comparison of measures

25 Number of runs by query type (out of 134) Query typesAutomaticManual Visual283 Textual141 Mixed862

26 Best results overall Institute for Infocomm Research (Singapore) and IPAL-CNRS (France) – run IPALI2R_TIan Used combination of image and text processing –Latter focused on mapping terms to semantic categories, e.g., modality, anatomy, pathology, etc. MAP – 0.28 Precision at –10 images – 0.62 (6.2 images) –30 images – 0.53 (18 images) –100 images – 0.32 (32 images)

27 Results for top 30 runs – not much variation

28 Best results (MAP) by query type Query typesAutomaticManual VisualI2Rfus.txt i2r-vk-avg.txt TextualIPALI2R_Tn OHSUmanual.txt MixedIPALI2R_TIan OHSUmanvis.txt Automatic-mixed runs best (including those not shown)

29 Best results (MAP) by topic type (for each query type) Visual runs clearly hampered by textual (semantic) queries

30 Relevant and MAP by topic – great deal of variation Visual Mixed Textual

31 Interesting “quirk” in results from OHSU runs Man-Mixed starts out good but falls rapidly, with lower MAP MAP measure values recall; may not be best for this task

32 Also much variation by topic in OHSU runs

33 ImageCLEF medical retrieval task results – 2006 Primary measure – MAP Results reported in track overview on CLEF Web site (Müller, 2006) and in following slides –Runs submitted –Best results overall –Best results by query type –Comparison by topic type –Comparison by query type –Comparison of measures –Interesting finding from OHSU runs

34 Categories of runs Query type – human preparation –Automatic – no human modification –Manual – human modification of query –Interactive – human modification of query after viewing output (not designated in 2005) System type – feature(s) –Textual – searching only via textual annotations –Visual – searching only by visual means –Mixed – textual and visual searching –(NOTE: Topic types have these category names too)

35 Runs submitted by category System Type Query Type VisualMixedTextualTotal Automatic Manual Interactive1214 Total

36 Best results overall Institute for Infocomm Research (Singapore) and IPAL-CNRS (France) (Lacoste, 2006) Used combination of image and text processing –Latter focused on mapping terms to semantic categories, e.g., modality, anatomy, pathology, etc. MAP – Precision at –10 images – (6.2 images) –30 images – (17.4 images) –100 images – (40 images)

37 Best performing runs by system and query type Automated textual or mixed query runs best

38 Results for all runs Variation between MAP and precision for different systems

39 Best performing runs by topic type for each system type Mixed queries most robust across all topic types Visual queries least robust to non-visual topics

40 Relevant and MAP by topic Visual Mixed Textual Substantial variation across all topics and topic types

41 Interesting finding from OHSU runs in 2006 similar to 2005 Mixed run had higher precision despite lower MAP Could precision at top of output be more important for user?

42 Conclusions A variety of approaches are effective in image retrieval, similar to IR with other content Systems that use only visual retrieval are less robust than those that solely do textual retrieval –A possibly fruitful area of research might be ability to predict which queries are amenable to what retrieval approaches Need broader understanding of system use followed by better test collections and experiments based on that understanding –MAP might not be the best performance measure for the image retrieval task

43 Limitations This test collection –Topics artificial – may not be realistic or representative –Annotation of images may not be representative or of best practice Test collections generally –Relevance is situational –No users involved in experiments

44 Future directions ImageCLEF 2007 –Continue work on annual cycle –Funded for another year from NSF grant –Expanding image collection, adding new topics User experiments with OHSU image retrieval system –Aim to better understand real-world tasks and best evaluation measures for those tasks Continued analysis of data –Improved text retrieval of annotations –Improved merging of image and text retrieval –Look at methods of predicting which queries amenable to different approaches