A Framework for Result Diversification

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Context-Sensitive Query Auto-Completion AUTHORS:NAAMA KRAUS AND ZIV BAR-YOSSEF DATE OF PUBLICATION:NOVEMBER 2010 SPEAKER:RISHU GUPTA 1.
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!
Introduction to Information Retrieval
Text Categorization.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.
Finding Skyline Nodes in Large Networks. Evaluation Metrics:  Distance from the query node. (John)  Coverage of the Query Topics. (Big Data, Cloud Computing,
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
Ziv Bar-YossefMaxim Gurevich Google and Technion Technion TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Active Learning and Collaborative Filtering
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Evaluating Search Engine
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
Personalized Search Result Diversification via Structured Learning
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
IR Models: Review Vector Model and Probabilistic.
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Search Result Diversification by M. Drosou and E. Pitoura Presenter: Bilge Koroglu June 14, 2011.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint.
Diversifying Search Results Rakesh AgrawalSreenivas GollapudiSearch LabsMicrosoft Research Alan HalversonSamuel.
Learning to Rank From Pairwise Approach to Listwise Approach.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Post-Ranking query suggestion by diversifying search Chao Wang.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10 Evaluation.
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Sampath Jayarathna Cal Poly Pomona
Lecture 10 Evaluation.
Personalizing Search on Shared Devices
Structured Learning of Two-Level Dynamic Rankings
Chapter 6. Large Scale Optimization
Lecture 6 Evaluation.
Evaluating Information Retrieval Systems
Feature Selection for Ranking
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
INF 141: Information Retrieval
Presentation transcript:

A Framework for Result Diversification Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford) , Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft Research)

Ambiguous queries wine 2009

Definition of Diversification Intuitive definition Represent a variety of relevant meanings for a given query Mathematical definitions: Minimizing query abandonment Want to represent different user categories Trade-off between relevance and novelty

Research on diversification Query and document similarities Maximal Marginal Relevance [CG98] Personalized re-ranking of results [RD06] Probability Ranking Principle not optimal [CK06] Query abandonment Topical diversification [Z+05, AGHI09] Needs topical (categorical) information Loss minimization framework [Z02, ZL06] “Diminishing returns” for docs w/ the same intent is a specific loss function [AGHI09]

The framework Express diversity requirements in terms of desired properties Define objectives that satisfy these properties Develop efficient algorithms Metrics and evaluation methodologies

Axiomatic approach Inspired by similar approaches for Recommendation systems [Andersen et al ’08] Ranking [Altman, Tennenholtz ’07] Clustering [Kleinberg ’02] Map the space of functions – a “basis vector”

Diversification Setup (1/2) Input: Candidate documents: U={u1,u2,…, un}, query q Relevance function: wq(ui) Distance function: dq(ui, uj) (symmetric, non-metric) Size k of output result set wq(u5) u5 u1 u2 u3 u4 u6 dq(,u2,u4)

Diversification Setup (2/2) Output Diversified set S* of documents (|S*|= k) Diversification function: f : S x wq x dq  R+ S* = argmax f(S) (|S|=k) u1 u4 u6 k = 3 S* = {u1,u2,u6} u2 u3 u5

Axioms Scale-invariance Consistency Richness Strength Stability Relevance Diversity Stability Two technical properties

Scale Invariance Axiom S* = argmaxS f(S, w(·), d(·, ·)) = argmaxS f(S, w΄(·), d΄(·, ·)) w΄(ui) = α · w(ui) d΄(ui,uj) = α · d(ui,uj) S*(3) No built-in scale for f !

Consistency Axiom S* = argmaxS f(S, w(·), d(·, ·)) w΄(ui) = w(ui) + ai for ui є S* d΄(ui,uj) = d(ui,uj) + bi for ui and/or uj є S* S*(3) Increasing relevance/ diversity doesn’t hurt!

Stability Axiom S*(k) = argmaxS f(S, w(·), d(·, ·),k) S*(k) S*(k+1) for all k S*(3) S*(4) Output set shouldn’t oscillate (change arbitrarily) with size

Impossibility result Scale-invariance, Consistency, Richness, Strength of Relevance/Diversity, Stability, Two technical properties Theorem: No function f can satisfy all the axioms simultaneously. Proof via constructive argument

Axiomatic characterization– Summary Baseline for what is possible Mathematical criteria for choosing f Modular approach: f is independent of specific wq(·) and dq(·, ·)!

A Framework for Diversification Express diversity requirements in terms of desired properties Define objectives that satisfy these properties Develop efficient algorithms Metrics and evaluation methodologies

Recall – Diversification Framework Input: U={u1,u2,…,un}, k, wq(·) and dq(·, ·) Some set of (top) n results Output: S* = argmaxS f(S, w(·), d(·, ·),k) Find the most diverse set of results of size k Advantages: Can integrate f with existing ranking engine Modular, plug-and-play framework

Diversification objectives Max-sum (avg) objective: Violates stability! u1 u4 u6 k = 3 S* = {u1,u2,u6} u2 k = 4 S* = {u1,u3,u5,u6} u3 u3 u5 u5

Diversification objectives Max-min objective: Violates consistency and stability! u1 u4 u6 k = 3 S* = {u1,u2,u6} u2 S* = {u1,u5,u6} u3 u5 u5

Other Diversification objectives A taxonomy-based diversification objective Uses the analogy of marginal utility to determine whether to include more results from an already covered category Violates stability and one of the technical axioms

The Framework Express diversity requirements in terms of desired properties Define objectives that satisfy these properties Develop efficient algorithms Metrics and evaluation methodologies

Algorithms for facility dispersion Recast as facility dispersion Max-sum (MaxSumDispersion): Max-min(MaxMinDispersion): Known approximation algorithms Lower bounds Lots of other facility dispersion objectives and algorithms

Algorithm for categorical diversification ∀c ∈ C, U (c |q) ← P (c |q) while |S| < k do for d ∈ D do g (d |q, c) ← c U (c |q)V (d |q,c) end for d∗ ← argmax g (d | q, c) S ← S ∪ {d∗} ∀c ∈ C, U (c |q) ← (1−V (d∗ |q, c))U (c |q) D ← D \ {d∗} end while P(c | q): conditional prob of intent c given query q g(d | q, c): current prob of d satisfying q, c Update the utility of a category

An Example Intent distribution: P (R |q) = 0.8, P (B |q) = 0.2. U(R | q) = 0.08 0.8 U(B | q) = 0.07 0.2 0.12 D V(d | q, c) g(d | q, c) S Actually produces an ordered set of results Results not proportional to intent distribution Results not according to (raw) quality Better results ⇒ less needed to be shown 0.9 0.9 ×0.8 0.72 0.5 ×0.08 ×0.08 ×0.8 0.04 0.40 0.4 ×0.08 ×0.8 ×0.08 0.03 0.32 0.4 0.4 ×0.2 ×0.2 0.08 0.08 0.4 0.4 ×0.2 ×0.2 ×0.12 0.08 0.08 0.05

The Framework Express diversity requirements in terms of desired properties Define objectives that satisfy these properties Develop efficient algorithms Metrics and evaluation methodologies

Evaluation Methodologies Approach Represent real queries Scale beyond a few user studies Problem: Hard to define ground truth Use disambiguated information sources on the web as the ground truth Incorporate intent into human judgments Can exploit the user distribution (need to be careful)

Wikipedia Disambiguation Pages Query = Wikipedia disambiguation page title Large-scale ground truth set Open source Growing in size

Metrics Based on Wikipedia Topics Novelty Coverage of wikipedia topics Relevance coverage of top Wikipedia results

The Relevance and Distance Functions Relevance function: 1/position Can use the search engine score Maybe use query category information Distance function: Compute TF-IDF distances Jaccard similarity score for two documents A and B:

Evaluating Novelty Topics/categories = list of disambiguation topics Given a set Sk of results: For each result, compute a distribution over topics (using our d(·, ·)) Sum confidence over all topics Threshold to get # topics represented Category confidence Jaguar cat: 0.1+0.8 Jaguar car: 0.9+0.2 Threshold = 1.0 Jaguar cat: 0 Jaguar car: 1 jaguar.com Jaguar cat (0.1) Jaguar car (0.9) wikipedia.org/jaguar Jaguar cat (0.8) Jaguar car (0.2)

Evaluating Relevance Query – get ranking of search restricted to Wikipedia pages a(i) = position of Wikipedia topic i in this list b(i) = position of Wikipedia topic i in list being tested Relevance is measured in terms of reciprocal ranks:

Adding Intent to Human Judgments (Generalizing Relevance Metrics) Take expectation over distribution of intents Interpretation: how will the average user feel? Consider NDCG@k Classic: NDCG-IA depends on intent distribution and intent-specific NDCG

Evaluation using Mechanical Turk Created two types of HITs on Mechanical Turk Query classification: workers are asked to choose among three interpretations Document rating (under the given interpretation) Two additional evaluations MT classification + current ratings MT classification + MT document ratings

Some Important Questions When is it right to diversify? Users have certain expectations about the workings of a search engine What is the best way to diversify? Evaluate approaches beyond diversifying the retrieved results Metrics that capture both relevance and diversity Some preliminary work suggests that there will be certain trade-offs to make

Questions?

Why frame diversification as set selection? Otherwise, need to encode explicit user model in the metric Selection only needs k (which is 10) Later, can rank set according to relevance Personalize based on clicks Alternative to stability: Select sets repeatedly (this loses information) Could refine selection online, based on user clicks

Novelty Evaluation – Effect of Algorithms

Relevance Evaluation – Effect of Algorithms

Product Evaluation – Anecdotal Result Results for query cd player Relevance: popularity Distance: from product hierarchy

Preliminary Results (100 queries)

Evaluation using Mechanical Turk

Other Measures of Success Many metrics for relevance Normalized discounted cumulative gains at k (NDCG@k) Mean average precision at k (MAP@k) Mean reciprocal rank (MRR) Some metrics for diversity Maximal marginal relevance (MMR) [CG98] Nugget-based instantiation of NDCG [C+08] Want a metric that can take into account both relevance and diversity [JK00]

Problem Statement Diversify(k) Given a query q, a set of documents D, distribution P(c | q), quality estimates V(d | c, q), and integer k Find a set of docs S  D with |S| = k that maximizes interpreted as the probability that the set S is relevant to the query over all possible intentions Multiple intents Find at least one relevant doc

Discussion of Objective Makes explicit use of taxonomy In contrast, similarity-based: [CG98], [CK06], [RKJ08] Captures both diversification and doc relevance In contrast, coverage-based: [Z+05], [C+08], [V+08] Specific form of “loss minimization” [Z02], [ZL06] “Diminishing returns” for docs w/ the same intent Objective is order-independent Assumes that all users read k results May want to optimize k P(k) P(S | q)