Cumulated Gain-Based Evaluation of IR Techniques

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval
Advertisements

Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Super Awesome Presentation Dandre Allison Devin Adair.
Chapter 5: Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Evaluating Search Engine
Evaluation in Information Retrieval Speaker: Ruihua Song Web Data Management Group, MSR Asia.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
SLIDE 1IS 240 – Spring 2011 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
Modern Information Retrieval
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
SLIDE 1IS 240 – Spring 2011 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
INFO 624 Week 3 Retrieval System Evaluation
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Evaluating the Performance of IR Sytems
Lessons Learned from Information Retrieval Chris Buckley Sabir Research
Evaluation CSC4170 Web Intelligence and Social Computing Tutorial 5 Tutor: Tom Chao Zhou
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Evaluating Search Engines in chapter 8 of the book Search Engines Information Retrieval in Practice Hongfei Yan.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals  Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
IR System Evaluation Farhad Oroumchian. IR System Evaluation System-centered strategy –Given documents, queries, and relevance judgments –Try several.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets  Test collections: evaluating rankings Interleaving.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Clustering C.Watters CS6403.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Lecture 3: Retrieval Evaluation Maya Ramanath. Benchmarking IR Systems Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided.
C.Watterscs64031 Evaluation Measures. C.Watterscs64032 Evaluation? Effectiveness? For whom? For what? Efficiency? Time? Computational Cost? Cost of missed.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Jhu-hlt-2004 © n.j. belkin 1 Information Retrieval: A Quick Overview Nicholas J. Belkin
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Chapter 8: Evaluation Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
DivQ: Diversification for Keyword Search over Structured Databases Elena Demidova, Peter Fankhauser, Xuan Zhou and Wolfgang Nejfl L3S Research Center,
Relevance Feedback Hongning Wang
Evaluation of Information Retrieval Systems Xiangming Mu.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10 Evaluation.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008 Annotations by Michael L. Nelson.
Sampath Jayarathna Cal Poly Pomona
Evaluation of IR Systems
Lecture 10 Evaluation.
Modern Information Retrieval
IR Theory: Evaluation Methods
Lecture 6 Evaluation.
Evaluating Information Retrieval Systems
Relevance and Reinforcement in Interactive Browsing
INF 141: Information Retrieval
Learning to Rank with Ties
Retrieval Evaluation - Measures
Precision and Recall Reminder:
A Neural Passage Model for Ad-hoc Document Retrieval
Presentation transcript:

Cumulated Gain-Based Evaluation of IR Techniques Liu bingbing

Motivation There are so many different kinds of IR techniques , but which one is better? And how to evaluate these techniques?

Outline Introduction Cumulated gain-based measurements Case study : comparison of some TREC-7 results at different relevance levels Discussion

Outline Introduction Cumulated gain-based measurements Case study : comparison of some TREC-7 results at different relevance levels Discussion

Background Highly relevant documents should be identified and ranked first It’s necessary to develop measures to evaluate different IR techniques

Old measures Highly and marginally relevant documents are given equal credit IR documents are judged relevant or irrelevant Graded relevance judgments

New measures CG DCG nCG nDCG

Outline Introduction Cumulated gain-based measurements Case study : comparison of some TREC-7 results at different relevance levels Discussion

Principles Highly relevant documents are more important than marginally relevant ones Documents found late are less important

Relationship CG G BV n(D)CG DCG

Direct Cumulated Gain (CG) For example G `=<3, 2, 3, 0, 0, 1, 2, 2, 3, 0, : : :> CG`=<3, 5, 8, 8, 8, 9, 11, 13, 16, 16, : : :>

Discounted Cumulated Gain (DCG) For example G`=<3, 2, 3, 0, 0, 1, 2, 2, 3, 0, : : :> DCG `=<3, 5, 6.89, 6.89, 6.89, 7.28, 7.99, 8.66, 9.61, 9.61, : : :>

Best possible Vectors Theoretically

A sample ideal gain vector (BV) CG`=<3, 6, 9, 11, 13, 15, 16, 17, 18, 19, 19, 19, 19, : : :> DCG`=<3, 6, 7.89, 8.89, 9.75, 10.52, 10.88, 11.21, 11.53, 11.83, 11.83, 11.83, : : :> base=2

Relative to the Ideal Measure—the Normalized (D)CG Measure Norm-vect (V, I)=<v1/i1, v2/i2, : : : , vk/ik> For example nCG=norm-vect( CG, CGI) nDCG=norm-vect(DCG,DCGI)

Comparison to Earlier Measures Average search length (ASL) estimate the average position of a relevant document Expected search length (ESL) average number of documents that must be examined to retrieve a given number of relevant documents ………………. Both of them either don’t take the degree of document relevance into account or depend on the retrieved list size or …

The strengths of new measures -CG,DCG,NCG,NDCG Take the degree of relevance of document into account Don’t depend on the size of recall base Don’t depend on outliers Be obvious to interpret

In addition DCG has further advantages Weights down the gain found later Model user persistence

Outline Introduction Cumulated gain-based measurements Case study : comparison of some TREC-7 results at different relevance levels Discussion

Data source TREC-7 50 queries from topic statements 51800 document or 1.9 GB data we used result lists for 20 topics by five participants from the TREC-7 ad hoc manual track

Relevance judgments The new judgment is reliable New judgment is stricter

Cumulated gain (a) Binary weighting (b) Nonbinary weighting

Discounting gain

Normalized (D)CG Vectors and Statistical Testing

Normalized (D)CG Vectors and Statistical Testing

About the case study D 1 2 3 4 5 6 7 8 9 10 G For example: So: 1 6 10 3 4 2 5 8 7 9 Ideal=<3,3,3,2,2,1,1,1,0,0> 3 1 4 2 6 A=<2,3,2,1,3,…> ....... D 1 2 3 4 5 6 7 8 9 10 G

Outline Introduction Cumulated gain-based measurements Case study : comparison of some TREC-7 results at different relevance levels Discussion

Several parameters Last Rank Considered Gain Values Discounting Factor

Limitations Don’t take order effects on relevance judgments or document overlap into account Deal with a single dimension only Be unable to handle dynamic changes

Benefites Take the degree of document relevance into account Model user persistence