Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Information Retrieval Models: Probabilistic Models
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Empirical Development of an Exponential Probabilistic Model Using Textual Analysis to Build a Better Model Jaime Teevan & David R. Karger CSAIL (LCS+AI),
Evaluating Search Engine
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Evaluating Search Engines in chapter 8 of the book Search Engines Information Retrieval in Practice Hongfei Yan.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
IR System Evaluation Farhad Oroumchian. IR System Evaluation System-centered strategy –Given documents, queries, and relevance judgments –Try several.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Modern Retrieval Evaluations Hongning Wang
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:
Relevance Feedback Hongning Wang
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008 Annotations by Michael L. Nelson.
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Seesaw Personalized Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
A Formal Study of Information Retrieval Heuristics
Evaluation of IR Systems
Evaluation.
IR Theory: Evaluation Methods
John Lafferty, Chengxiang Zhai School of Computer Science
Relevance and Reinforcement in Interactive Browsing
Presentation transcript:

Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

ACM SIGIR 2006Slide 2 Outline Motivations Expected Metric Principle Metrics Bayesian Retrieval Objectives Heuristics Experimental Results Related Work Future Work and Conclusions

August 9, 2006ACM SIGIR 2006Slide 3 Motivation In IR, we have formal models, and formal metrics Models provide framework for retrieval –E.g.: Probabilistic Metrics provide rigorous evaluation mechanism –E.g.: Precision and recall Probability ranking principle (PRP) provably optimal for precision/recall –Ranking by probability of relevance But other metrics capture other notions of result set quality  and PRP isn ’ t necessarily optimal

August 9, 2006ACM SIGIR 2006Slide 4 Example: Diversity User may be satisfied with one relevant result –Navigational queries, question/answering In this case, we want to “ hedge our bets ” by retrieving for diversity in result set –Better to satisfy different users with different interpretations, than one user many times over Reciprocal rank/search length metrics capture this notion PRP is suboptimal

August 9, 2006ACM SIGIR 2006Slide 5 IR System Design Metrics define preference ordering on result sets –Metric[Result set 1] > Metric[Result set 2]  Result set 1 preferred to Result set 2 Traditional approach: Try out heuristics that we believe will improve relevance performance –Heuristics not directly motivated by metric –E.g. synonym expansion, psuedorelevance feedback Observation: Given a model, we can try to directly optimize for some metric

August 9, 2006ACM SIGIR 2006Slide 6 Expected Metric Principle (EMP) Knowing which metric to use tells us what to maximize for – the expected value of the metric for each result set, given a model Corpus Document 1 Document 2 Document 3 1, 2 1, 3 2, 1 2, 3 3, 1 3, 2 Result Sets Calculate E[Metric] using model Return set with max score

August 9, 2006ACM SIGIR 2006Slide 7 Our Contributions Primary: EMP – metric as retrieval goal –Metric designed to measure retrieval quality Metrics we consider: n, search length, reciprocal rank, instance recall, k-call –Build probabilistic model –Retrieve to maximize an objective: the expected value of metric Expectations calculated according to our probabilistic model –Use computational heuristics to make optimization problem tractable Secondary: retrieving for diversity (special case) –A natural side effect of optimizing for certain metrics

August 9, 2006ACM SIGIR 2006Slide 8 Detour: What is a Heuristic? Ad hoc approach Use heuristics that are believed to be correlated with good performance Heuristics used to improve relevance Heuristics (probably) make system slower Infinite number of possibilities, no formalism Model, heuristics intertwined Our approach Build model that directly optimizes for good performance Heuristics used to improve efficiency Heuristics (probably) make optimization worse Well-known space of optimization techniques Clean separation between model and heuristics

August 9, 2006ACM SIGIR 2006Slide 9 Our Contributions Primary: EMP – metric as retrieval goal –Metric designed to measure retrieval quality Metrics we consider: n, search length, reciprocal rank, instance recall, k-call –Build probabilistic model –Retrieve to maximize an objective: the expected value of metric Expectations calculated according to our probabilistic model –Use computational heuristics to make optimization problem tractable Secondary: retrieving for diversity (special case) –A natural side effect of optimizing for certain metrics

August 9, 2006ACM SIGIR 2006Slide 10 Search Length/Reciprocal Rank (Mean) search length (MSL): number of irrelevant results until first relevant (Mean) reciprocal rank (MRR): one over rank of first relevant } Search length = 2 Reciprocal rank = 1/3

August 9, 2006ACM SIGIR 2006Slide 11 Instance Recall Each topic has multiple instances (subtopics, aspects) Instance recall is how many instances covered (in union) over first n results } Instance 5 = 0.75

August 9, 2006ACM SIGIR 2006Slide 12 n Binary metric: 1 if top n results has k relevant, 0 otherwise 1-call is (1 – %no) –See TREC robust track } 5 = 1 5 = 1 5 = 0

August 9, 2006ACM SIGIR 2006Slide 13 Motivation for k-call 1-call: Want one relevant document –Many queries satisfied with one relevant result –Only need one relevant document, more room to explore  promotes result set diversity n-call: Want all relevant documents –“ Perfect precision ” –Hone in on one interpretation and stick to it! Intermediate k –Risk/reward tradeoff Plus, easily modeled in our framework –Binary variable

August 9, 2006ACM SIGIR 2006Slide 14 Our Contributions Primary: EMP – metric as retrieval goal –Metric designed to measure retrieval quality Metrics we consider: n, search length, reciprocal rank, instance recall, k-call –Build probabilistic model –Retrieve to maximize an objective: the expected value of metric Expectations calculated according to our probabilistic model –Use computational heuristics to make optimization problem tractable Secondary: retrieving for diversity (special case) –A natural side effect of optimizing for certain metrics

August 9, 2006ACM SIGIR 2006Slide 15 Bayesian Retrieval Model There exists distributions that generate relevant documents, irrelevant documents PRP: rank by Remaining modeling questions: form of rel/irrel distributions and parameters for those distributions In this paper, we assume multinomial models, and choose parameters by maximum a posteriori –Prior is background corpus word distribution

August 9, 2006ACM SIGIR 2006Slide 16 Our Contributions Primary: EMP – metric as retrieval goal –Metric designed to measure retrieval quality Metrics we consider: n, search length, reciprocal rank, instance recall, k-call –Build probabilistic model –Retrieve to maximize an objective: the expected value of metric Expectations calculated according to our probabilistic model –Use computational heuristics to make optimization problem tractable Secondary: retrieving for diversity (special case) –A natural side effect of optimizing for certain metrics

August 9, 2006ACM SIGIR 2006Slide 17 Objective Probability Ranking Principle (PRP): maximize at each step in ranking Expected Metric Principle (EMP): maximize for complete result set In particular for k-call, maximize:

August 9, 2006ACM SIGIR 2006Slide 18 Our Contributions Primary: EMP – metric as retrieval goal –Metric designed to measure retrieval quality Metrics we consider: n, search length, reciprocal rank, instance recall, k-call –Build probabilistic model –Retrieve to maximize an objective: the expected value of metric Expectations calculated according to our probabilistic model –Use computational heuristics to make optimization problem tractable Secondary: retrieving for diversity (special case) –A natural side effect of optimizing for certain metrics

August 9, 2006ACM SIGIR 2006Slide 19 Optimization of Objective Exact optimization of objective is usually NP-hard –E.g.: Exact optimization for k-call reducible to NP-hard maximum graph clique problem Approximation heuristic: Greedy algorithm –Select documents successively in rank order –Hold previous documents fixed, optimize objective at each rank d1d1 Maximize E[metric | d]

August 9, 2006ACM SIGIR 2006Slide 20 Optimization of Objective Exact optimization of objective is usually NP-hard –E.g.: Exact optimization for k-call reducible to NP-hard maximum graph clique problem Approximation heuristic: Greedy algorithm –Select documents successively in rank order –Hold previous documents fixed, optimize objective at each rank d1d1 Fixed d2d2 Maximize E[metric | d, d 1 ]

August 9, 2006ACM SIGIR 2006Slide 21 Optimization of Objective Exact optimization of objective is usually NP-hard –E.g.: Exact optimization for k-call reducible to NP-hard maximum graph clique problem Approximation heuristic: Greedy algorithm –Select documents successively in rank order –Hold previous documents fixed, optimize objective at each rank d1d1 Fixed d2d2 d3d3 Maximize E[metric | d, d 1, d 2 ]

August 9, 2006ACM SIGIR 2006Slide 22 Greedy on 1-call and n-call 1-greedy –Greedy algorithm reduces to ranking each successive document assuming all previous documents are irrelevant –Algorithm has “ discovered ” incremental negative pseudorelevance feedback n-greedy: Assume all previous documents relevant

August 9, 2006ACM SIGIR 2006Slide 23 Greedy on Other Metrics Greedy with precision/recall  reduces to PRP! Greedy on k-call for general k (k-greedy) –More complicated … Greedy with MSL, MRR, instance recall works out to 1-greedy algorithm –Intuition: to make first relevant document appear earlier, we want to hedge our bets as to query interpretation (i.e., diversify)

August 9, 2006ACM SIGIR 2006Slide 24 Experiments Overview Experiments verify that optimizing for metric improves performance on metric –They do not tell us which metrics to use Looked at ad hoc diversity examples TREC topics/queries Tuned weights on separate development set Tested on: –Standard ad hoc (robust track) topics –Topics with multiple annotators –Topics with multiple instances

August 9, 2006ACM SIGIR 2006Slide 25 Diversity on Google Results Task: reranking top 1,000 Google results In optimizing 1-call, our algorithm finds more diverse results than PRP, Google results

August 9, 2006ACM SIGIR 2006Slide 26 Experiments: Robust Track TREC 2003, 2004 robust tracks –249 topics –528,000 documents 1-call, 10-call results statistically significant PRP greedy greedy

August 9, 2006ACM SIGIR 2006Slide 27 Experiments: Instance Retrieval TREC-6,7,8 interactive tracks –20 topics –210,000 documents –7 to 56 instances per topic PRP baseline:instance 10 = Greedy 1-call:instance 10 = 0.315

August 9, 2006ACM SIGIR 2006Slide 28 Experiments: Multi-annotator TREC-4,6 ad hoc retrieval –Independent annotators assessed same topics –TREC-4: 49 topics, 568,000 documents, 3 annotators –TREC-6: 50 topics, 556,000 documents, 2 annotators  More annotators more satisfied using 1-greedy 1-call (1)1-call (2)1-call (3)Total TREC-4PRP TREC-41-greedy TREC-6PRP N/A1.280 TREC-61-greedy N/A1.620

August 9, 2006ACM SIGIR 2006Slide 29 Related Work Fits in risk minimization framework (objective as negative loss function) Other approaches look at optimizing for metrics directly, with training data Pseudorelevance feedback Subtopic retrieval Maximal marginal relevance Clustering See paper for references

August 9, 2006ACM SIGIR 2006Slide 30 Future Work General k-call (k = 2, etc.) –Determination if this is what users want Better underlying probabilistic model –Our contribution is in the ranking objective, not the model  model can be arbitrarily sophisticated Better optimization techniques –E.g., Local search would differentiate algorithms for MRR and 1-call Other metrics –Preliminary work on mean average precision, recall (Perhaps) surprisingly, these metrics are not optimized by PRP!

August 9, 2006ACM SIGIR 2006Slide 31 Conclusions EMP: Metric can motivate model – choosing and believing in a metric already gives us a reasonable objective, E[metric] Can potentially apply EMP on top of a variety of different underlying probabilistic models Diversity is one practical example of a natural side effect of using EMP with the right metric

August 9, 2006ACM SIGIR 2006Slide 32 Acknowledgments Harr Chen supported by the Office of Naval Research through a National Defense Science and Engineering Graduate Fellowship Jaime Teevan, Susan Dumais, and anonymous reviewers provided constructive feedback ChengXiang Zhai, William Cohen, and Ellen Voorhees provided code and data