1 CS 430: Information Discovery Lecture 21 Interactive Retrieval.

1 CS 430: Information Discovery Lecture 21 Interactive Retrieval

2 Course Administration Wireless laptop experiment During the semester, we have been logging URLs used via the nomad proxy server. Working with the HCI Group, we would like to analyze these URLs to study students' patterns of use of online information. The analysis will be completely anonymous. This requires your consent. If you have not signed a consent form, we have forms here for your signature. If you do not sign a consent form, the data will be discarded without being looked at.

3 The Human in the Loop Search index Return hits Browse repository Return objects

4 Query Refinement Query formulation and search Display number of hits Reformulate query or display Display retrieved information Decide next step no hits new query reformulate query

5 Reformulation of Query Manual Add or remove search terms Change Boolean operators Change wild cards Automatic Remove search terms Change weighting of search terms Add new search terms

6 Query Reformulation: Vocabulary Tools Feedback Information about stop lists, stemming, etc. Numbers of hits on each term or phrase Suggestions Thesaurus Browse lists of terms in the inverted index Controlled vocabulary

7 Query Reformulation: Document Tools Feedback to user consists of document excerpts or surrogates Shows the user how the system has interpreted the query Effective at suggesting how to restrict a search Shows examples of false hits Less good at suggesting how to expand a search No examples of missed items

8 Example: Tilebars The figure represents a set of hits from a text search. Each large rectangle represents a document or section of text. Each row represents a search term or subquery. The density of each small square indicates the frequency with which a term appears in a section of a document. Hearst 1995

9 Document Vectors as Points on a Surface Normalize all document vectors to be of length 1 Then the ends of the vectors all lie on a surface with unit radius For similar documents, we can represent parts of this surface as a flat region Similar document are represented as points that are close together on this surface From Lecture 9

10 Theoretically Best Query x x x x o o o optimal query x non-relevant documents o relevant documents o o o x x x x x x x x x x x x  x x

11 Theoretically Best Query For a specific query, Q, let: D R be the set of all relevant documents D N-R be the set of all non-relevant documents sim (Q, D R ) be the mean similarity between query Q and documents in D R sim (Q, D N-R ) be the mean similarity between query Q and documents in D N-R The theoretically best query would maximize: F = sim (Q, D R ) - sim (Q, D N-R )

12 Estimating the Best Query In practice, D R and D N-R are not known. (The objective is to find them.) However, the results of an initial query can be used to estimate sim (Q, D R ) and sim (Q, D N-R ).

13 Relevance Feedback (concept) x x x x o o o   hits from original search x documents identified as non-relevant o documents identified as relevant  original query reformulated query  From Lecture 9

14 Rocchio's Modified Query Modified query vector = Original query vector + Mean of relevant documents found by original query - Mean of non-relevant documents found by original query

15 Query Modification Q 1 = Q 0 + R i - S i  i =1 n1n1 n1n1 1  n2n2 n2n2 1 Q 0 = vector for the initial query Q 1 = vector for the modified query R i = vector for relevant document i S i = vector for non-relevant document i n 1 = number of relevant documents n 2 = number of non-relevant documents Rocchio 1971

16 Difficulties with Relevance Feedback x x x x o o o   optimal query x non-relevant documents o relevant documents  original query reformulated query  o o o x x x x x x x x x x x x  x x Hits from the initial query are contained in the gray shaded area

17 Effectiveness of Relevance Feedback Best when: Relevant documents are tightly clustered (similarities are large) Similarities between relevant and non-relevant documents are small

18 Positive and Negative Feedback Q 1 =  Q 0 +  R i -  S i  i =1 n1n1 n1n1 1  n2n2 n2n2 1 ,  and  are weights that adjust the importance of the three vectors. If  = 0, the weights provide positive feedback, by emphasizing the relevant documents in the initial set. If  = 0, the weights provide negative feedback, by reducing the emphasis on the non-relevant documents in the initial set.

19 When to Use Relevance Feedback Relevance feedback is most important when the user wishes to increase recall, i.e., it is important to find all relevant documents. Under these circumstances, users can be expected to put effort into searching: Formulate queries thoughtfully with many terms Review results carefully to provide feedback Iterate several times Combine automatic query enhancement with studies of thesauruses and other manual enhancements

1 CS 430: Information Discovery Lecture 21 Interactive Retrieval.

Similar presentations

Presentation on theme: "1 CS 430: Information Discovery Lecture 21 Interactive Retrieval."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 CS 430: Information Discovery Lecture 21 Interactive Retrieval.

Similar presentations

Presentation on theme: "1 CS 430: Information Discovery Lecture 21 Interactive Retrieval."— Presentation transcript:

Similar presentations

About project

Feedback