PRES A Score Metric for Evaluating Recall- Oriented IR Applications Walid Magdy Gareth Jones Dublin City University SIGIR, 22 July 2010.

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Evaluating Search Engine
Information Retrieval Review
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Information Retrieval Ch Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection.
Evaluating the Performance of IR Sytems
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Evaluation Information retrieval Web. Purposes of Evaluation System Performance Evaluation efficiency of data structures and methods operational profile.
Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Building a Domain-Specific Document Collection for Evaluating Metadata Effects on Information Retrieval Walid Magdy, Jinming Min, Johannes Leveling, Gareth.
Building a Domain-Specific Document Collection for Evaluating Metadata Effects on Information Retrieval Walid Magdy, Jinming Min, Johannes Leveling, Gareth.
A Study on Query Expansion Methods for Patent Retrieval Walid MagdyGareth Jones Centre for Next Generation Localisation School of Computing Dublin City.
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010.
Clustering Top-Ranking Sentences for Information Access Anastasios Tombros, Joemon Jose, Ian Ruthven University of Glasgow & University of Strathclyde.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets  Test collections: evaluating rankings Interleaving.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment.
Thomas Mandl: GeoCLEF Track Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.
C.Watterscs64031 Evaluation Measures. C.Watterscs64032 Evaluation? Effectiveness? For whom? For what? Efficiency? Time? Computational Cost? Cost of missed.
Evidence from Behavior
Performance Measurement. 2 Testing Environment.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
What Does the User Really Want ? Relevance, Precision and Recall.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Information Retrieval Quality of a Search Engine.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10 Evaluation.
Context-Sensitive IR using Implicit Feedback Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Developments in Evaluation of Search Engines
Evaluation of Information Retrieval Systems
Walid Magdy Gareth Jones
Evaluation.
Modern Information Retrieval
IR Theory: Evaluation Methods
1Micheal T. Adenibuyan, 2Oluwatoyin A. Enikuomehin and 2Benjamin S
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
INF 141: Information Retrieval
Large Scale Findability Analysis
Precision and Recall Reminder:
Presentation transcript:

PRES A Score Metric for Evaluating Recall- Oriented IR Applications Walid Magdy Gareth Jones Dublin City University SIGIR, 22 July 2010

Recall-Oriented IR Examples: patent search and legal search Objective: find all possible relevant documents Search: takes much longer Users: professionals and more patient IR Campaigns: NTCIR, TREC, CLEF Evaluation: mainly MAP!!!

Current Evaluation Metrics For a topic with 4 relevant docs and 1 st 100 docs are to be checked: System1: relevant ranks = {1} System2: relevant ranks = {50, 51, 53, 54} System3: relevant ranks = {1, 2, 3, 4} System4: relevant ranks = {1, 98, 99, 100} AP system1 = 0.25 AP system2 = AP system3 = 1 R system1 = 0.25 R system2 = 1 R system3 = 1 F 1 system1 = F 1 system2 = F 1 system3 = F 1 system1 = 0.25 F 1 system2 = F 1 system3 = 1 F 4 system1 = 0.25 F 4 system2 = F 4 system3 = 1 AP system4 = R system4 = 1 F 4 system4 = 0.864

Normalized Recall (R norm ) R norm is the area between the actual case and the worst as a proportion of the area between the best and the worst. N: collection size n: number of relevant docs r i : the rank at which the i th relevant document is retrieved

Applicability of R norm R norm requires the following: 1. Known collection size (N) 2. Number of relevant documents (qrels) (n) 3. Retrieving documents till reaching 100% recall (r i ) Workaround: – Un-retrieved relevant docs are considered as worst case – For large scale document collection: R norm ≈ Recall

N worst_case = N max + n For recall = 1  n/N max ≤ ≤ 1 For recall = R  nR 2 /N max ≤ ≤ R R norm Modification PRES: Patent Retrieval Evaluation Score PRESR norm | M PRESR norm | M

PRES Performance For a topic with 4 relevant docs and 1 st 100 docs are to be checked: System1: relevant ranks = {1} System2: relevant ranks = {50, 51, 53, 54} System3: relevant ranks = {1, 2, 3, 4} System4: relevant ranks = {1, 98, 99, 100} APR/R norm F4F4 PRES System10.25 System System31111 System n = 4, N max = 100

Average Performance 48 runs in CLEF-IP 2009 PRES vs MAP vs Recall Change in Scores Change in Ranking N max = 1000 Run IDMAPRecallPRES R R R R R PRES MAPRecall Correlation

PRES Designed for recall-oriented applications Gives higher score for systems achieving higher recall and better average relative ranking Designed for laboratory testing Dependent on user’s potential/effort (N max ) Going to be applied in CLEF-IP 2010 PRESeval Get PRESeval from:

What should I say next? Let me check What should I say? MAP system Thank bla Recall system bla Thank you PRES system bla Thank you bla Thank you