LIS618 lecture 11 i/r performance evaluation Thomas Krichel 2011-11-21.

Slides:



Advertisements
Similar presentations
LIS618 lecture 2 Thomas Krichel Structure Theory: information retrieval performance Practice: more advanced dialog.
Advertisements

Retrieval Evaluation J. H. Wang Mar. 18, Outline Chap. 3, Retrieval Evaluation –Retrieval Performance Evaluation –Reference Collections.
LIS618 lecture 9 Web retrieval Thomas Krichel
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
1 Retrieval Performance Evaluation Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 3)
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Precision and Recall.
Evaluating Search Engine
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Modern Information Retrieval
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
CS/Info 430: Information Retrieval
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Evaluation CSC4170 Web Intelligence and Social Computing Tutorial 5 Tutor: Tom Chao Zhou
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
| 1 › Gertjan van Noord2014 Zoekmachines Lecture 5: Evaluation.
LIS618 lecture 2 Thomas Krichel Structure of talk General round trip on theoretical matters, part –Information retrieval models vector model.
Evaluation David Kauchak cs458 Fall 2012 adapted from:
Evaluation David Kauchak cs160 Fall 2009 adapted from:
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
LIS618 lecture 1 Thomas Krichel economic rational for traditional model In olden days the cost of telecommunication was high. database use.
LIS618 lecture 2 the Boolean model Thomas Krichel
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Information Retrieval Lecture 7. Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations.
Evaluation INST 734 Module 5 Doug Oard. Agenda  Evaluation fundamentals Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)
IR System Evaluation Farhad Oroumchian. IR System Evaluation System-centered strategy –Given documents, queries, and relevance judgments –Try several.
LIS618 lecture 3 Thomas Krichel Structure of talk Document Preprocessing Basic ingredients of query languages Retrieval performance evaluation.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.
Lecture 3: Retrieval Evaluation Maya Ramanath. Benchmarking IR Systems Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided.
C.Watterscs64031 Evaluation Measures. C.Watterscs64032 Evaluation? Effectiveness? For whom? For what? Efficiency? Time? Computational Cost? Cost of missed.
Performance Measurement. 2 Testing Environment.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
What Does the User Really Want ? Relevance, Precision and Recall.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Information Retrieval (based on Jurafsky and Martin) Miriam Butt October 2003.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Information Retrieval Quality of a Search Engine.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10 Evaluation.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008 Annotations by Michael L. Nelson.
Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Sampath Jayarathna Cal Poly Pomona
7CCSMWAL Algorithmic Issues in the WWW
Information Retrieval and Web Search
Lecture 10 Evaluation.
Evaluation.
Modern Information Retrieval
Query Caching in Agent-based Distributed Information Retrieval
Data Mining Chapter 6 Search Engines
Q4 Measuring Effectiveness
Lecture 6 Evaluation.
Retrieval Evaluation - Measures
Retrieval Performance Evaluation - Measures
Precision and Recall Reminder:
Precision and Recall.
Presentation transcript:

LIS618 lecture 11 i/r performance evaluation Thomas Krichel

retrieval performance evaluation "Recall" and "Precision" are two classic measures to measure the performance of information retrieval in a single query. Both assume that there is an answer set of documents that contain the answer to the query. Performance is optimal if – the database returns all the documents in the answer set – the database returns only documents in the answer set Recall is the fraction of the relevant documents that the query result has captured. Precision is the fraction of the retrieved documents that is relevant.

recall and precision curves Assume that all the retrieved documents arrive at once and are being examined. During that process, the user discover more and more relevant documents. Recall increases. During the same process, at least eventually, there will be less and less useful document. Precision declines (usually). This can be represented as a curve.

Example Let the answer set be {0,1,2,3,4,5,6,7,8,9} and non-relevant documents represented by letters. A query reveals the following result: 7,a,3,b,c,9,n,j,l,5,r,o,s,e,4. For the first document, (recall, precision) is (10%,100%), for the third (20%,66%), for the sixth (30%,50%), for the tenth (40%,40%), and for the last (50%,33%).

recall/precision curves Such curves can be formed for each query. An average curve, for each recall level, can be calculated for several queries. Recall and precision levels can also be used to calculate two single-valued summaries. – average precision at seen document – R-precision

R-precision This is a pretty ad-hoc measure. Let R be the size of the answer set. Take the first R results of the query. Find the number of relevant documents Divide by R. In our example, the R-precision is 40%. An average can be calculated for a number of queries.

average precision at seen document To find it, sum all the precision level for each new relevant document discovered by the user and divide by the total number of relevant documents for the query. In our example, it is ( )/5=57.8% This measure favors retrieval methods that get the relevant documents to the top.

critique of recall & precision Recall has to be estimated by an expert. Recall is very difficult to estimate in a large collection. They focus on one query only. No serious user works like this. There are some other measures, but that is more for an advanced course in IR.

Please shutdown the computers when you are done. Thank you for your attention!