Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.

Similar presentations


Presentation on theme: "1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback."— Presentation transcript:

1 1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback

2 2 Course Administration Assignment 2 There was an error on Slide 24 of Lecture 11, Comparing a Query and a Document. A revised slide has been posted on the web site. In calculating similarities, remember that the cosine of the angle between two vectors x and y is: x.y |x| |y| Do not forget the scaling factors, |x| and |y|.

3 3 Query Refinement Search Display number of hits Reformulate query Display retrieved information no hits new query reformulated query Query formulation EXIT

4 4 Reformulation of Query Manual Add or remove search terms Change Boolean operators Change wild cards Automatic Remove search terms Change weighting of search terms Add new search terms

5 5 Query Reformulation: Vocabulary Tools Feedback Information about stop lists, stemming, etc. Numbers of hits on each term or phrase Suggestions Thesaurus Browse lists of terms in the inverted index Controlled vocabulary

6 6 Query Reformulation: Document Tools Feedback to user consists of document excerpts or surrogates Shows the user how the system has interpreted the query Effective at suggesting how to restrict a search Shows examples of false hits Less good at suggesting how to expand a search No examples of missed items

7 7 Relevance Feedback: Document Vectors as Points on a Surface Normalize all document vectors to be of length 1 Then the ends of the vectors all lie on a surface with unit radius For similar documents, we can represent parts of this surface as a flat region Similar document are represented as points that are close together on this surface From Lecture 3

8 8 Results of a Search x x x x x x x  hits from search x documents found by search  query

9 9 Relevance Feedback (Concept) x x x x o o o  hits from original search x documents identified as non-relevant o documents identified as relevant  original query reformulated query

10 10 Theoretically Best Query x x x x o o o optimal query x non-relevant documents o relevant documents o o o x x x x x x x x x x x x x x

11 11 Theoretically Best Query For a specific query, Q, let: D R be the set of all relevant documents D N-R be the set of all non-relevant documents sim (Q, D R ) be the mean similarity between query Q and documents in D R sim (Q, D N-R ) be the mean similarity between query Q and documents in D N-R The theoretically best query would maximize: F = sim (Q, D R ) - sim (Q, D N-R )

12 12 Estimating the Best Query In practice, D R and D N-R are not known. (The objective is to find them.) However, the results of an initial query can be used to estimate sim (Q, D R ) and sim (Q, D N-R ).

13 13 Rocchio's Modified Query Modified query vector = Original query vector + Mean of relevant documents found by original query - Mean of non-relevant documents found by original query

14 14 Query Modification Q 1 = Q 0 + R i - S i  i =1 n1n1 n1n1 1  n2n2 n2n2 1 Q 0 = vector for the initial query Q 1 = vector for the modified query R i = vector for relevant document i S i = vector for non-relevant document i n 1 = number of relevant documents n 2 = number of non-relevant documents Rocchio 1971

15 15 Difficulties with Relevance Feedback x x x x o o o  optimal query x non-relevant documents o relevant documents  original query reformulated query o o o x x x x x x x x x x x x x x Hits from the initial query are contained in the gray shaded area

16 16 Difficulties with Relevance Feedback x x x x o o o  optimal results set x non-relevant documents o relevant documents  original query reformulated query o o o x x x x x x x x x x x x x x What region provides the optimal results set?

17 17 Effectiveness of Relevance Feedback Best when: Relevant documents are tightly clustered (similarities are large) Similarities between relevant and non-relevant documents are small

18 18 When to Use Relevance Feedback Relevance feedback is most important when the user wishes to increase recall, i.e., it is important to find all relevant documents. Under these circumstances, users can be expected to put effort into searching: Formulate queries thoughtfully with many terms Review results carefully to provide feedback Iterate several times Combine automatic query enhancement with studies of thesauruses and other manual enhancements

19 19 Adjusting Parameters 1: Relevance Feedback Q 1 =  Q 0 +  R i -  S i  i =1 n1n1 n1n1 1  n2n2 n2n2 1 ,  and  are weights that adjust the importance of the three vectors. If  = 0, the weights provide positive feedback, by emphasizing the relevant documents in the initial set. If  = 0, the weights provide negative feedback, by reducing the emphasis on the non-relevant documents in the initial set.

20 20 Adjusting Parameters 2: Filtering Incoming Messages D 1, D 2, D 3,... is a stream of incoming documents that are to be divided into two sets: R - documents judged relevant to an information need S - documents judged not relevant to the information need A query is defined as the vector in the term vector space: Q = (w 1, w 2,..., w n ) where w i is the weight given to term i D j will be assigned to R if similarity(Q, D j ) > What is the optimal query, i.e., the optimal values of the w i ?

21 21 Seeking Optimal Parameters Theoretical approach Develop a theoretical model Derive parameters Test with users Heuristic approach Develop a heuristic Vary parameters Test with users Machine learning approach

22 22 Seeking Optimal Parameters using Machine Learning GENERAL:EXAMPLE: Text RetrievalInput: training examples queries with relevance judgments design space parameters of retrieval functionTraining: automatically find the solution find parameters so that many in design space that works well relevant documents are ranked on the training data highlyPrediction: predict well on new examples rank relevant documents high also for new queries Joachims

23 23 Task Application Text Routing Help-Desk Support: Who is an appropriate expert for a particular problem? Information Information Agents: FilteringWhich news articles are interesting to a particular person? Relevance Information Retrieval: FeedbackWhat are other documents relevant for a particular query? Text Knowledge Management: Categorization Organizing a document database by semantic categories. Machine Learning: Tasks and Applications Joachims

24 24 Learning to Rank Assume: distribution of queries P(Q) distribution of target rankings for query P(R | Q) Given: collection D of documents independent, identically distributed training sample (q i, r i ) Design: set of ranking functions F loss function l(r a, r b ) learning algorithm Goal: find f  F that minimizes l (f (q), r )dP(q, r ) ∫ Joachims

25 25 A Loss Function for Rankings For two orderings r a and r b, a pair is: concordant, if r a and r b agree in their ordering P = number of concordant pairs discordant, if r a and r b disagree in their ordering Q = number of discordant pairs Loss function: l(r a, r b ) = Q Example: r a = (a, c, d, b, e, f, g, h) r b = (a, b, c, d, e, f, g, h) The discordant pairs are: (c, b), (d, b) l(r a, r b ) = 2 Joachims

26 26 Machine Learning: Algorithms The choice of algorithms is a subject of active research, which is covered in several courses, notably CS 478 and CS/INFO 630. Some effective methods include: Naive Bayes Rocchio Algorithm C4.5 Decision Tree k-Nearest Neighbors Support Vector Machine

27 27 Relevance Feedback: Clickthrough Data Relevance feedback methods have suffered from the unwillingness of users to provide feedback. Joachims and others have developed methods that use Clickthrough data from online searches. Concept: Suppose that a query delivers a set of hits to a user. If a user skips a link a and clicks on a link b ranked lower, then the user preference reflects rank(b) < rank(a).

28 28 Clickthrough Example Ranking Presented to User: 1. Kernel Machines http://svm.first.gmd.de/ 2. Support Vector Machine http://jbolivar.freeservers.com/ 3. SVM-Light Support Vector Machine http://ais.gmd.de/~thorsten/svm light/ 4. An Introduction to Support Vector Machines http://www.support-vector.net/ 5. Support Vector Machine and Kernel... References http://svm.research.bell-labs.com/SVMrefs.html Ranking: (3 < 2) and (4 < 2) User clicks on 1, 3 and 4 Joachims


Download ppt "1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback."

Similar presentations


Ads by Google