Assessing The Retrieval A.I Lab 2007.01.20 박동훈. Contents 4.1 Personal Assessment of Relevance 4.2 Extending the Dialog with RelFbk 4.3 Aggregated Assessment.

Slides:



Advertisements
Similar presentations
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Advertisements

Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Retrieval Evaluation J. H. Wang Mar. 18, Outline Chap. 3, Retrieval Evaluation –Retrieval Performance Evaluation –Reference Collections.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Information Retrieval IR 7. Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations.
Evaluating Search Engine
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Modern Information Retrieval
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Information retrieval: overview. Information Retrieval and Text Processing Huge literature dating back to the 1950’s! SIGIR/TREC - home for much of this.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:
| 1 › Gertjan van Noord2014 Zoekmachines Lecture 5: Evaluation.
Evaluation David Kauchak cs458 Fall 2012 adapted from:
Evaluation David Kauchak cs160 Fall 2009 adapted from:
Search Engines and Information Retrieval Chapter 1.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Assessing the Retrieval Chapter 2 considered various ways of breaking text into indexable features Chapter 3 considered various ways of weighting combinations.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals  Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 7 9/13/2011.
Information Retrieval Lecture 7. Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
1 Query Operations Relevance Feedback & Query Expansion.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
IR System Evaluation Farhad Oroumchian. IR System Evaluation System-centered strategy –Given documents, queries, and relevance judgments –Try several.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Latent Semantic Indexing and Probabilistic (Bayesian) Information Retrieval.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
C.Watterscs64031 Evaluation Measures. C.Watterscs64032 Evaluation? Effectiveness? For whom? For what? Efficiency? Time? Computational Cost? Cost of missed.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Evaluation of Retrieval Effectiveness 1.
What Does the User Really Want ? Relevance, Precision and Recall.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Evaluation of Retrieval Effectiveness 2.
Evaluation of Information Retrieval Systems Xiangming Mu.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Information Retrieval Quality of a Search Engine.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Sampath Jayarathna Cal Poly Pomona
Evaluation of Information Retrieval Systems
Modern Information Retrieval
IR Theory: Evaluation Methods
Cumulated Gain-Based Evaluation of IR Techniques
Retrieval Evaluation - Measures
Retrieval Performance Evaluation - Measures
Precision and Recall Reminder:
Presentation transcript:

Assessing The Retrieval A.I Lab 박동훈

Contents 4.1 Personal Assessment of Relevance 4.2 Extending the Dialog with RelFbk 4.3 Aggregated Assessment : Search Engine Performance 4.4 RAVE : A Relevance Assessment Vehicle 4.5 Summary

4.1 Personal Assessment of Relevance Cognitive Assumptions – Users trying to do ‘object recognition’ – Comparison with respect to prototypic document – Reliability of user opinions? – Relevance Scale – RelFbk is nonmetric

Relevance Scale

Users naturally provides only preference information Not(metric) measurement of how relevant a retrieved document is! RelFbk is nonmetric

4.2 Extending the Dialog with RelFbk RelFbk Labeling of the Retr Set

Query Session, Linked by RelFbk

4.2.1 Using RelFbk for Query Refinment

4.2.2 Document Modifications due to RelFbk Fig 4.7 Change documents!? More/less the query that successfully / un matches them

4.3 Aggregated Assessment : Search Engine Performance Underlying Assumptions –RelFbk(q,di) assessments independent –Users’ opinions will all agree with single ‘omniscient’ expert’s

4.3.2 Consensual relevance Consensually relevant

4.3.4 Basic Measures Relevant versus Retrieved Sets

Contingency table NRel : the number of relevant documents NNRel : the number of irrelevant documents NDoc : the total number of documents NRet : the number of retrieved documents NNRet : the number of documents not retrieved

4.3.4 Basic Measures (cont)

4.3.4 Basic Measures (cont)

4.3.5 Ordering the Retr set Each document assigned hitlist rank Rank(di) Descending Match(q,di) Rank(di) Match(q,dj) –Rank(di) Pr(Rel(dj)) Coordination level : document’s rank in Retr –Number of keywords shared by doc and query Goal:Probability Ranking Principle

A tale of two retrievals Query1Query2

Recall/precision curve Query1

Recall/precision curve Query1

Retrieval envelope

4.3.6 Normalized recall ri : i 번째 relevant doc 의 hitlist rank Worst Best

4.3.8 One-Parameter Criteria Combining recall and precision Classification accuracy Sliding ratio Point alienation

Combining recall and precision F-measure –[Jardine & van Rijsbergen71] –[Lewis&Gale94] Effectiveness –[vanRijsbergen, 1979] E=1-F, α=1/(β 2 +1) α=0.5=>harmonic mean of precision & recall

Classification accuracy accuracy Correct identification of relevant and irrelevant

Sliding ratio Imagine a nonbinary, metric Rel(di) measure Rank1, Rank2 computed by two separate systems

Point alienation Developed to measure human preference data Capturing fundamental nonmetric nature of RelFbk

4.3.9 Test corpora More data required for “test corpus” Standard test corpora TREC:Text Retrieval Evaluation Conference TREC’s refined queries TREC constantly expanding, refining tasks

More data required for “test corpus” Documents Queries Relevance assessments Rel(q,d) Perhaps other data too – Classification data (Reuters) – Hypertext graph structure (EB5)

Standard test corpora

TREC constantly expanding, refining tasks Ad hoc queries tasks Routing/filtering task Interactive task

Other Measure Expected search length (ESL) –Length of “path” as user walks down HitList –ESL=Num. irrelevant documents before each relevant document –ESL for random retrieval –ESL reduction factor

4.5 Summary Discussed both metric and nonmetric relevance feedback The difficulties in getting users to provide relevance judgments for documents in the retrieved set Quantified several measures of system perfomance