CS 430: Information Discovery

Slides:



Advertisements
Similar presentations
Critical Reading Strategies: Overview of Research Process
Advertisements

Information Retrieval IR 7. Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations.
Evaluating Search Engine
Search Engines and Information Retrieval
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
1 CS 430: Information Discovery Lecture 10 Cranfield and TREC.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430: Information Discovery Lecture 3 Inverted Files and Boolean Operations.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Information Retrieval in Practice
INFO 624 Week 3 Retrieval System Evaluation
CS 430 / INFO 430 Information Retrieval
1 CS 430: Information Discovery Lecture 20 The User in the Loop.
Basic Scientific Writing in English Lecture 3 Professor Ralph Kirby Faculty of Life Sciences Extension 7323 Room B322.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
1 CS 430: Information Discovery Lecture 2 Introduction to Text Based Information Retrieval.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Latent Semantic Indexing.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
USING STUDENT OUTCOMES WHEN INTEGRATING INFORMATION LITERACY SKILLS INTO COURSES Information Literacy Department Asa H. Gordon Library Savannah State University.
| 1 › Gertjan van Noord2014 Zoekmachines Lecture 5: Evaluation.
Search Engines and Information Retrieval Chapter 1.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Evaluation of IR LIS531H. Why eval? When designing and using a system there are decisions to be made:  Manual or automatic indexing?  Controlled vocabularies.
1 CS 430 / INFO 430 Information Retrieval Lecture 2 Text Based Information Retrieval.
Modern Information Retrieval Computer engineering department Fall 2005.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
1 CS 430: Information Discovery Lecture 3 Inverted Files.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
1 CS 430: Information Discovery Sample Midterm Examination Notes on the Solutions.
1 Automatic indexing Salton: When the assignment of content identifiers is carried out with the aid of modern computing equipment the operation becomes.
C.Watterscs64031 Evaluation Measures. C.Watterscs64032 Evaluation? Effectiveness? For whom? For what? Efficiency? Time? Computational Cost? Cost of missed.
1 CS 430: Information Discovery Lecture 11 Latent Semantic Indexing.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Evaluation of Retrieval Effectiveness 1.
June 2003INIS Training Seminar1 INIS Training Seminar 2-6 June 2003 Subject Analysis Thesaurus and Indexing Alexander Nevyjel Subject Control Unit INIS.
1 CS 430: Information Discovery Lecture 8 Evaluation of Retrieval Effectiveness II.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Searching Full Text 3.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Evaluation of Retrieval Effectiveness 2.
1 CS 430: Information Discovery Lecture 5 Ranking.
1 CS 430: Information Discovery Lecture 11 Cranfield and TREC.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Evaluation of Retrieval Effectiveness 1.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Information Retrieval in Practice
Information Retrieval in Practice
BA Art Extension Examination Preparation
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Information Retrieval (in Practice)
Experimental Psychology
Text Based Information Retrieval
CS 430: Information Discovery
CS 430: Information Discovery
Ranking in IR and WWW Modern Information Retrieval: A Brief Overview
Evaluation.
CS 430: Information Discovery
Understanding Standards:
CS 430: Information Discovery
Advanced Information Retrieval
Introduction to Information Retrieval
Lecture 8 Information Retrieval Introduction
Retrieval Performance Evaluation - Measures
Discussion Class 7 User Requirements.
CS 430: Information Discovery
Presentation transcript:

CS 430: Information Discovery Lecture 8 Evaluation of Retrieval Effectiveness 1

Course administration Assignment 1 -- Report The report is 33% of the grade. In particular, it should: Describe the data structures used for the index file, postings, and document file, with brief explanation of why those structures were chosen. Explain the mathematical expressions for the term weightings and how they were calculated. If your report did not include this information, you may resubmit the report before Thursday 5 p.m. Instructions will be posted on the Web site.

Course administration Discussion Class 4 Check the Web site. (a) It is not necessary to study the entire paper in detail (b) The PDF version of the file is damaged. Use the PostScript version. Discussion Class 3 Because of the fire in Upson, this class will not count towards the final grade

Retrieval Effectiveness Designing an information retrieval system has many decisions: Manual or automatic indexing? Natural language or controlled vocabulary? What stoplists? What stemming methods? What query syntax? etc. How do we know which of these methods are most effective? Is everything a matter of judgment?

Studies of Retrieval Effectiveness • The Cranfield Experiments, Cyril W. Cleverdon, Cranfield College of Aeronautics, 1957 -1968 • SMART System, Gerald Salton, Cornell University, 1964-1988 • TREC, Donna Harman, National Institute of Standards and Technology (NIST), 1992 -

Cranfield Experiments (Example) Comparative efficiency of indexing systems: (Universal Decimal Classification, alphabetical subject index, a special facet classification, Uniterm system of co-ordinate indexing) Four indexes prepared manually for each document in three batches of 6,000 documents -- total 18,000 documents, each indexed four times. The documents were reports and paper in aeronautics. Indexes for testing were prepared on index cards and other cards. Very careful control of indexing procedures.

Cranfield Experiments (continued) Searching: • 1,200 test questions, each satisfied by at least one document • Reviewed by expert panel • Searches carried out by 3 expert librarians • Two rounds of searching to develop testing methodology • Subsidiary experiments at English Electric Whetstone Laboratory and Western Reserve University

The Cranfield Data The Cranfield data was made widely available and used by other researchers • Salton used the Cranfield data with the SMART system (a) to study the relationship between recall and precision, and (b) to compare automatic indexing with human indexing • Sparc Jones and van Rijsbergen used the Cranfield data for experiments in relevance weighting, clustering, definition of test corpora, etc.

Cranfield Experiments -- Measures of Effectiveness for Matching Methods Cleverdon's work was applied to matching methods. He made extensive use of recall and precision, based on concept of relevance. precision (%) Each x represents one search. The graph illustrates the trade-off between precision and recall. x x x x x x x x x x recall (%)

Typical precision-recall graph for different queries Narrow, specific query 1.0 0.75 0.5 Broad, general query 0.25 recall 0.25 0.5 0.75 1.0

Some Cranfield Results • The various manual indexing systems have similar retrieval efficiency • Retrieval effectiveness using automatic indexing can be at least as effective as manual indexing with controlled vocabularies -> original results from the Cranfield + SMART experiments (published in 1967) -> considered counter-intuitive -> other results since then have supported this conclusion

Relevance Recall and precision: depend on concept of relevance -> Is relevance a context-, task-independent property of documents? "Relevance is the correspondence in context between an information requirement statement (a query) and an article (a document), that is, the extent to which the article covers the material that is appropriate to the requirement statement." F. W. Lancaster, 1979

Relevance as a set comparison D = set of documents A = set of documents that satisfy some user-based criterion B = set of documents identified by the search system

Measures based on relevance retrieved relevant | A  B | relevant | A | retrieved | B | retrieved not-relevant | B - A  B | not-relevant | D - A | recall = = precision = = fallout = =

Relevance • Recall and precision values are for a specific set of documents and type of queries (e.g., subject-heading queries, title queries, paragraphs), and a specific information task. • Relevance is subjective, but experimental evidence suggests that for textual documents different experts have similar judgments about relevance. • Experiments have tried asking users to give a numeric relevance level or rank the relevant documents, but the results have been less consistent. • Tests and judgment of relevance must use realistic queries.

Ranked retrieval: Recall and precision after retrieval of n documents n relevant recall precision 1 yes 0.2 1.0 2 yes 0.4 1.0 3 no 0.4 0.67 4 yes 0.6 0.75 5 no 0.6 0.60 6 yes 0.8 0.67 7 no 0.8 0.57 8 no 0.8 0.50 9 no 0.8 0.44 10 no 0.8 0.40 11 no 0.8 0.36 12 no 0.8 0.33 13 yes 1.0 0.38 14 no 1.0 0.36 SMART system using Cranfield data, 200 documents in aeronautics of which 5 are relevant

Precision-recall graph Note: Some authors plot recall against precision. 1 2 1.0 4 0.75 6 3 5 0.5 13 12 0.25 200 recall 0.25 0.5 0.75 1.0

11 Point Precision (Recall Cut Off) p(n) is precision at that point where recall has first reached n Define 11 standard recall points p(r0), p(r1), ... p(r11), where p(rn) = p(n/10) Note: if p(rn) is not an exact data point, use interpolation

Recall cutoff graph: choice of interpolation points precision 1 2 The blue line is the recall cutoff graph. 1.0 4 0.75 6 3 5 0.5 13 12 0.25 200 recall 0.25 0.5 0.75 1.0

Example: SMART System on Cranfield Data Recall Precision 0.0 1.0 0.1 1.0 0.2 1.0 0.3 1.0 0.4 1.0 0.5 0.75 0.6 0.75 0.7 0.67 0.8 0.67 0.9 0.38 1.0 0.38 Precision values in blue are actual data. Precision values in red are by interpolation.