1 CS 430: Information Discovery Lecture 21 Interactive Retrieval.

Slides:



Advertisements
Similar presentations
Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Advertisements

Relevance Feedback Limitations –Must yield result within at most 3-4 iterations –Users will likely terminate the process sooner –User may get irritated.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
K nearest neighbor and Rocchio algorithm
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
1 CS 430 / INFO 430 Information Retrieval Lecture 15 Usability 3.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Modern Information Retrieval Chapter 5 Query Operations.
1 Query Language Baeza-Yates and Navarro Modern Information Retrieval, 1999 Chapter 4.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
1 CS 430: Information Discovery Lecture 20 The User in the Loop.
Query Reformulation: User Relevance Feedback. Introduction Difficulty of formulating user queries –Users have insufficient knowledge of the collection.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
9/21/2000Information Organization and Retrieval Ranking and Relevance Feedback Ray Larson & Marti Hearst University of California, Berkeley School of Information.
Internet Resources Discovery (IRD) Advanced Topics.
Search engines fdm 20c introduction to digital media lecture warren sack / film & digital media department / university of california, santa.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Chapter 5: Information Retrieval and Web Search
LOGO XML Keyword Search Refinement 郭青松. Outline  Introduction  Query Refinement in Traditional IR  XML Keyword Query Refinement  My work.
1 CS 430 / INFO 430 Information Retrieval Lecture 2 Text Based Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement and Relevance Feedback.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
1 CS 430: Information Discovery Lecture 9 Term Weighting and Ranking.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 6: Information Retrieval and Web Search
1 Computing Relevance, Similarity: The Vector Space Model.
1 CS430: Information Discovery Lecture 18 Usability 3.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
C.Watterscsci64031 Probabilistic Retrieval Model.
Information Retrieval
1 CS 430: Information Discovery Lecture 8 Automatic Term Extraction and Weighting.
Relevance Feedback Hongning Wang
Hsin-Hsi Chen5-1 Chapter 5 Query Operations Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
User Interfaces for Information Access Prof. Marti Hearst SIMS 202, Lecture 26.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Information Retrieval Inverted Files.. Document Vectors as Points on a Surface Normalize all document vectors to be of length 1 Define d' = Then the ends.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
Automated Information Retrieval
Lecture 12: Relevance Feedback & Query Expansion - II
Text Based Information Retrieval
CS 430: Information Discovery
CS 430: Information Discovery
CS 430: Information Discovery
Multimedia Information Retrieval
Relevance Feedback Hongning Wang
Representation of documents and queries
CS 430: Information Discovery
CS 430: Information Discovery
Information Retrieval and Web Design
CS 430: Information Discovery
Presentation transcript:

1 CS 430: Information Discovery Lecture 21 Interactive Retrieval

2 Course Administration Wireless laptop experiment During the semester, we have been logging URLs used via the nomad proxy server. Working with the HCI Group, we would like to analyze these URLs to study students' patterns of use of online information. The analysis will be completely anonymous. This requires your consent. If you have not signed a consent form, we have forms here for your signature. If you do not sign a consent form, the data will be discarded without being looked at.

3 The Human in the Loop Search index Return hits Browse repository Return objects

4 Query Refinement Query formulation and search Display number of hits Reformulate query or display Display retrieved information Decide next step no hits new query reformulate query

5 Reformulation of Query Manual Add or remove search terms Change Boolean operators Change wild cards Automatic Remove search terms Change weighting of search terms Add new search terms

6 Query Reformulation: Vocabulary Tools Feedback Information about stop lists, stemming, etc. Numbers of hits on each term or phrase Suggestions Thesaurus Browse lists of terms in the inverted index Controlled vocabulary

7 Query Reformulation: Document Tools Feedback to user consists of document excerpts or surrogates Shows the user how the system has interpreted the query Effective at suggesting how to restrict a search Shows examples of false hits Less good at suggesting how to expand a search No examples of missed items

8 Example: Tilebars The figure represents a set of hits from a text search. Each large rectangle represents a document or section of text. Each row represents a search term or subquery. The density of each small square indicates the frequency with which a term appears in a section of a document. Hearst 1995

9 Document Vectors as Points on a Surface Normalize all document vectors to be of length 1 Then the ends of the vectors all lie on a surface with unit radius For similar documents, we can represent parts of this surface as a flat region Similar document are represented as points that are close together on this surface From Lecture 9

10 Theoretically Best Query x x x x o o o optimal query x non-relevant documents o relevant documents o o o x x x x x x x x x x x x  x x

11 Theoretically Best Query For a specific query, Q, let: D R be the set of all relevant documents D N-R be the set of all non-relevant documents sim (Q, D R ) be the mean similarity between query Q and documents in D R sim (Q, D N-R ) be the mean similarity between query Q and documents in D N-R The theoretically best query would maximize: F = sim (Q, D R ) - sim (Q, D N-R )

12 Estimating the Best Query In practice, D R and D N-R are not known. (The objective is to find them.) However, the results of an initial query can be used to estimate sim (Q, D R ) and sim (Q, D N-R ).

13 Relevance Feedback (concept) x x x x o o o   hits from original search x documents identified as non-relevant o documents identified as relevant  original query reformulated query  From Lecture 9

14 Rocchio's Modified Query Modified query vector = Original query vector + Mean of relevant documents found by original query - Mean of non-relevant documents found by original query

15 Query Modification Q 1 = Q 0 + R i - S i  i =1 n1n1 n1n1 1  n2n2 n2n2 1 Q 0 = vector for the initial query Q 1 = vector for the modified query R i = vector for relevant document i S i = vector for non-relevant document i n 1 = number of relevant documents n 2 = number of non-relevant documents Rocchio 1971

16 Difficulties with Relevance Feedback x x x x o o o   optimal query x non-relevant documents o relevant documents  original query reformulated query  o o o x x x x x x x x x x x x  x x Hits from the initial query are contained in the gray shaded area

17 Effectiveness of Relevance Feedback Best when: Relevant documents are tightly clustered (similarities are large) Similarities between relevant and non-relevant documents are small

18 Positive and Negative Feedback Q 1 =  Q 0 +  R i -  S i  i =1 n1n1 n1n1 1  n2n2 n2n2 1 ,  and  are weights that adjust the importance of the three vectors. If  = 0, the weights provide positive feedback, by emphasizing the relevant documents in the initial set. If  = 0, the weights provide negative feedback, by reducing the emphasis on the non-relevant documents in the initial set.

19 When to Use Relevance Feedback Relevance feedback is most important when the user wishes to increase recall, i.e., it is important to find all relevant documents. Under these circumstances, users can be expected to put effort into searching: Formulate queries thoughtfully with many terms Review results carefully to provide feedback Iterate several times Combine automatic query enhancement with studies of thesauruses and other manual enhancements