1 Retrieval Performance Evaluation Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, 1999. (Chapter 3)

Slides:



Advertisements
Similar presentations
Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Advertisements

Retrieval Evaluation J. H. Wang Mar. 18, Outline Chap. 3, Retrieval Evaluation –Retrieval Performance Evaluation –Reference Collections.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Performance Evaluation
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Evaluating Search Engine
Modern Information Retrieval Chapter 1: Introduction
1 CS 430: Information Discovery Lecture 10 Cranfield and TREC.
Modern Information Retrieval
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
1 CS 430 / INFO 430 Information Retrieval Lecture 11 Evaluation of Retrieval Effectiveness 2.
SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 11 Evaluation of Retrieval Effectiveness 2.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Evaluating the Performance of IR Sytems
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SLIDE 1IS 240 – Spring 2011 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
1 Discussion Class 5 TREC. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for others to comment. When.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Evaluation Information retrieval Web. Purposes of Evaluation System Performance Evaluation efficiency of data structures and methods operational profile.
LIS618 lecture 11 i/r performance evaluation Thomas Krichel
Performance Evaluation of Information Retrieval Systems
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
| 1 › Gertjan van Noord2014 Zoekmachines Lecture 5: Evaluation.
Information Retrieval CSE 8337 Spring 2007 Retrieval Evaluation Many slides in this section are adapted from Prof. Raymond J. Mooney in CS378 at UT which.
Information Retrieval and Web Search IR Evaluation and IR Standard Text Collections.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
1 Modern Information Retrieval Chapter 3. Evaluation.
Jane Reid, AMSc IRIC, QMUL, 16/10/01 1 Evaluation of IR systems Jane Reid
Evaluating IR systems/search engines. Measures for a search engine How fast does it index –Number of documents/hour –(Average document size) How fast.
Evaluating Search Engines in chapter 8 of the book Search Engines Information Retrieval in Practice Hongfei Yan.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
PERFORMANCE EVALUATION Information Retrieval Systems 1.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Information Retrieval and Web Search IR Evaluation and IR Standard Text Collections Instructor: Rada Mihalcea Some slides in this section are adapted from.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
C.Watterscs64031 Evaluation Measures. C.Watterscs64032 Evaluation? Effectiveness? For whom? For what? Efficiency? Time? Computational Cost? Cost of missed.
Performance Measurement. 2 Testing Environment.
Information Retrieval
Reference Collections: Collection Characteristics.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
1 CS 430: Information Discovery Lecture 8 Evaluation of Retrieval Effectiveness II.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
A Logistic Regression Approach to Distributed IR Ray R. Larson : School of Information Management & Systems, University of California, Berkeley --
Evaluation of Information Retrieval Systems Xiangming Mu.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Retrieval Evaluation Modern Information Retrieval, Chapter 3
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Evaluation of Information Retrieval Systems
Evaluation Anisio Lacerda.
Evaluation.
Evaluation of Information Retrieval Systems
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Measures
Retrieval Performance Evaluation - Measures
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Reference Collections
Presentation transcript:

1 Retrieval Performance Evaluation Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 3)

2 Recall and Precision l Recall l Precision l Goal high recall and high precision

3 Recall and Precision

4 Precision Vs. Recall Figure l Rq={d 3,d 5,d 9,d 25,d 39,d 44,d 56,d 71,d 89,d 123 } l Aq={d 123,d 84,d 56,d 6,d 8,d 9,d 511,d 129,d 187,d 25,d 38,d 48,d 250, d 113,d 3 } »R=10%, P=100% »R=20%, P=66% »R=50%, P=33.3% »R>50%, P=0% l Precision at 11 standard recall levels »0%, 10%, 20%, …, 100%

5 Average Precision Values l To evaluate the retrieval performance of an algorithm over all test queries, we average the precision at each recall level » average precision at the recall level r »N q is the number of queries used »P i (r) is the precision at recall level r for query i

6 Precision Interpolation l Rq={d 3,d 56,d 129 } l Aq={d 123,d 84,d 56,d 6,d 8,d 9,d 511,d 129,d 187,d 25,d 38,d 48,d 250, d 113,d 3 } »R=33%, P=33% »R=66%, P=25% »R=100%, P=20% l Let r j, j in {0, 1, 2, …, 10}, be a reference to the standard j-th recall level.

7 Additional Approach l Average precision at document cutoff points »For instance, we can compute the average precision when 5, 10, 15, 20, 30, 50, 100 relevant documents have been seen.

8 Single Value Summaries l Average Precision at Seen Relevant Documents »The idea is to generate a single value summary of the ranking by averaging the precision figures obtained after each new relevant document is observed »e.g. for example 1: ( )/5 »This measure favors systems which retrieve relevant documents quickly

9 Single Value Summaries (Cont.) l R-Precision »The idea here is to generate a single value summary of the ranking by computing the precision at the R-th position in the ranking, where R is the total number of relevant documents »e.g. for example 1: R-Precision is 0.4 »e.g. for example 2: R-Precision is 0.3 »The R-precision measure is useful for observing the behavior of an algorithms for each individual

10 Single Value Summaries (Cont.) l Precision Histograms »Use R-precision measures to compare the retrieval history of two algorithms through visual inspection »RP A/B (i)=RP A (i)-RP B (i)

11 Reference Collections l Small Collection »The ADI Collection (documents on information science) »INSPEC (abstracts on electronics, computer, and physics) »Medlars (medial article) »The CACM Collection »The ISI Collection l Large Collection »The TREC Collection

12 The TREC Collection l Initiated by Donna Harman at NIST (National Institute of Standards and Technology) in 1990s l Co-sponsored by the Information Technology Office of the DARPA as part of the TIPSTER Text Program

13 The Documents Collection at TREC l Resource »WSJ: Wall Stree Journal »AP: Associated Press (news wire) »ZIFF: Computer Selects (articles), Ziff-Davis »FR: Federal Register »DOE, SJMN, PAT, FT, CR, FBIS, LAT l Size »TREC-3: 2GB »TREC-6: 5.8GB »US$200 in 1998

14 TREC document example

15 The Example Information Requests (Topics) l 350 topics for the first six TREC Conference l Topic: »1-150: TREC-1 and TREC-2 –long-standing information needs » : TREC-3 –simpler structure » : TREC-4 –even shorter » : TREC-5 » : TREC-6

16 TREC Topic Example

17 The Relevant Documents for Each Topic l Pooling Method »The set of relevant documents for each example information request (topic) is obtained from a pool of possible relevant documents »The pool is created by taking the top K documents (usually, K=100) in the rankings generated by various participating retrieval systems »The documents in the pool are then shown to human assessors who ultimately decide on the relevance of each document

18 The Tasks at the TREC Collection l Add hoc task l Routing task l TREC-6 »Chinese »Filtering »Interactive »NLP »Cross Languages »High precision »Spoken document »Very large corpus

19 Evaluation Measures at the TREC Conference l Summary table statistics »the number of topics, the number of relevant documents retrieved, l Recall-Precision Averages »11 standard recall levels l Document level averages »5, 10, 20, 100, R l Average precision histogram »R-precision