Information Retrieval Quality of a Search Engine.

Slides:



Advertisements
Similar presentations
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Advertisements

Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Information Retrieval IR 7. Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations.
Precision and Recall.
ADFOCS 2004 Prabhakar Raghavan Lecture 3. Zones A zone is an identified region within a doc E.g., Title, Abstract, Bibliography Generally culled from.
Evaluating Search Engine
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Introduction to Information Retrieval and Web Search Lecture 8: Evaluation and Result Summaries.
Modern Information Retrieval
Scoring and Ranking 198:541. Scoring Thus far, our queries have all been Boolean Docs either match or not Good for expert users with precise understanding.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 8: Evaluation & Result Summaries.
INFO 624 Week 3 Retrieval System Evaluation
CS276A Information Retrieval Lecture 8. Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Evaluating the Performance of IR Sytems
Information Retrieval using the Boolean Model. Query Which plays of Shakespeare contain the words Brutus AND Caesar but NOT Calpurnia? Could grep all.
Algoritmi per IR Ranking. The big fight: find the best ranking...
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
CS276 Information Retrieval and Web Search
LIS618 lecture 11 i/r performance evaluation Thomas Krichel
Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Retrieval Evaluation Hongning Wang
| 1 › Gertjan van Noord2014 Zoekmachines Lecture 5: Evaluation.
Evaluation David Kauchak cs458 Fall 2012 adapted from:
Evaluation David Kauchak cs160 Fall 2009 adapted from:
Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Christopher Manning and Prabhakar.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Evaluating IR systems/search engines. Measures for a search engine How fast does it index –Number of documents/hour –(Average document size) How fast.
Quality of a search engine Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 8.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 7 9/13/2011.
Information Retrieval Lecture 7. Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Web search engines Paolo Ferragina Dipartimento di Informatica Università di Pisa.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Lecture 3: Retrieval Evaluation Maya Ramanath. Benchmarking IR Systems Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided.
Performance Measurement. 2 Testing Environment.
Retrieval Evaluation Hongning Wang
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
What Does the User Really Want ? Relevance, Precision and Recall.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10 Evaluation.
Web search engines Paolo Ferragina Dipartimento di Informatica Università di Pisa.
CS276 Information Retrieval and Web Search Lecture 8: Evaluation.
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Introduction to Information Retrieval Introduction to Information Retrieval Information Retrieval and Web Search Lecture 8: Evaluation.
Sampath Jayarathna Cal Poly Pomona
Evaluation of Information Retrieval Systems
Top-K documents Exact retrieval
Quality of a search engine
7CCSMWAL Algorithmic Issues in the WWW
Lecture 10 Evaluation.
Evaluation.
אחזור מידע, מנועי חיפוש וספריות
Modern Information Retrieval
IR Theory: Evaluation Methods
Lecture 6 Evaluation.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Lecture 8: Evaluation Hankz Hankui Zhuo
Precision and Recall Reminder:
Precision and Recall.
Presentation transcript:

Information Retrieval Quality of a Search Engine

Is it good ? How fast does it index Number of documents/hour (Average document size) How fast does it search Latency as a function of index size Expressiveness of the query language

Measures for a search engine All of the preceding criteria are measurable The key measure: user happiness …useless answers won’t make a user happy

Happiness: elusive to measure Commonest approach is given by the relevance of search results How do we measure it ? Requires 3 elements: 1.A benchmark document collection 2.A benchmark suite of queries 3.A binary assessment of either Relevant or Irrelevant for each query-doc pair

Evaluating an IR system Standard benchmarks TREC: National Institute of Standards and Testing (NIST) has run large IR testbed for many years Other doc collections: marked by human experts, for each query and for each doc, Relevant or Irrelevant  On the Web everything is more complicated since we cannot mark the entire corpus !!

General scenario Relevant Retrieved collection

Precision: % docs retrieved that are relevant [issue “junk” found] Precision vs. Recall Relevant Retrieved collection Recall: % docs relevant that are retrieved [issue “info” found]

How to compute them Precision: fraction of retrieved docs that are relevant Recall: fraction of relevant docs that are retrieved Precision P = tp/(tp + fp) Recall R = tp/(tp + fn) RelevantNot Relevant Retrievedtp (true positive) fp (false positive) Not Retrievedfn (false negative) tn (true negative)

Some considerations Can get high recall (but low precision) by retrieving all docs for all queries! Recall is a non-decreasing function of the number of docs retrieved Precision usually decreases

Precision vs. Recall Relevant Highest precision, very low recall Retrieved Precision: fraction of retrieved docs that are relevant Recall: fraction of relevant docs that are retrieved

Relevant Lowest precision and recall Retrieved Precision: fraction of retrieved docs that are relevant Recall: fraction of relevant docs that are retrieved Precision vs. Recall

Relevant Low precision and very high recall Retrieved Precision: fraction of retrieved docs that are relevant Recall: fraction of relevant docs that are retrieved Precision vs. Recall

Relevant Very high precision and recall Retrieved Precision: fraction of retrieved docs that are relevant Recall: fraction of relevant docs that are retrieved Precision vs. Recall

Precision-Recall curve We measures Precision at various levels of Recall Note: it is an AVERAGE over many queries precision recall x x x x

A common picture precision recall x x x x

Interpolated precision If you can increase precision by increasing recall, then you should get to count that…

Other measures Precision at fixed recall most appropriate for web search: 10 results 11-point interpolated average precision The standard measure for TREC: you take the precision at 11 levels of recall varying from 10% to 100% by 10% of retrieved docs each step, using interpolation, and average them

F measure Combined measure (weighted harmonic mean) : People usually use balanced F 1 measure i.e., with  = 1 or  = ½ thus 1/F = ½ (1/P + 1/R) Use this if you need to optimize a single measure that balances precision and recall.