Presentation on theme: "Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes."— Presentation transcript:
Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes negative always incomplete Hypothesis: relevant docs should be more like each other than like non-relevant docs
Relevance Feedback: Purpose Augment keyword retrieval: Query Reformulation give user opportunity to refine their query tailored to individual exemplar based – different type of information from the query Iterative, subjective improvement Evaluation!
Relevance Feedback: Early Usage by Rocchio Modify original keyword query strengthen terms in relevant docs weaken terms in non-relevant docs modify original query by weighting based on amount of feedback
Relevance Feedback: Early Results Evaluation: how much feedback needed how did recall/precision change Conclusion: improved recall & precision over even 1 iteration and return of up to 20 non- relevant docs Promising technique
Query Reformulation User does not know enough about document set to construct optimal query initially. Querying is iterative learning process repeating two steps: 1. expand original query with new terms (query expansion) 2. assign weights to the query terms (term reweighting)
Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster based 1. Local analysis: derive information from retrieved document set 2. Global analysis: derive information from corpus
Vector Based Reformulation Rocchio (~1965)with adjustable weights Ide Dec Hi (~1968) counts only the most similar non-relevant document
Probabilistic Reformulation Recall from earlier: still need to estimate probabilities: do so using relevance feedback!
Estimating Probabilities by Accumulating Statistics D r is set of relevant docs D r,i is set of relevant docs with term k i ni is number of docs in corpus containing term k i
Computing Similarity (Term Reweighting) assume: term independence and binary document indexing Cons: no term weighting, no query expansion, ignores previous weights
Croft Extensions include within document frequency weights initial search variant Last term is normalized within-document frequency. C and K are adjustable parameters.
Query Reformulation: Summary so far… Relevance feedback can produce dramatic improvements. However, must be careful that previously judged documents are not part of improvement and techniques have limitations. Next round of improvements requires clustering…
Croft Feedback Searches Use probability updates as in Robertson
Assumptions 1. Initial query was a good approximation. 2. Ideal query is approximated by shared terms in relevant documents.
Assumptions 1. Initial query was a good approximation. polysemy? synonyms? slang? concept drift? 2. Ideal query is approximated by shared terms in relevant documents.