Download presentation

1
**A Framework for Result Diversification**

Sreenivas Gollapudi Search Labs, Microsoft Research Joint work with Aneesh Sharma (Stanford) , Samuel Ieong, Alan Halverson, and Rakesh Agrawal (Microsoft Research)

2
Ambiguous queries wine 2009

3
**Definition of Diversification**

Intuitive definition Represent a variety of relevant meanings for a given query Mathematical definitions: Minimizing query abandonment Want to represent different user categories Trade-off between relevance and novelty

4
**Research on diversification**

Query and document similarities Maximal Marginal Relevance [CG98] Personalized re-ranking of results [RD06] Probability Ranking Principle not optimal [CK06] Query abandonment Topical diversification [Z+05, AGHI09] Needs topical (categorical) information Loss minimization framework [Z02, ZL06] “Diminishing returns” for docs w/ the same intent is a specific loss function [AGHI09]

5
The framework Express diversity requirements in terms of desired properties Define objectives that satisfy these properties Develop efficient algorithms Metrics and evaluation methodologies

6
**Axiomatic approach Inspired by similar approaches for**

Recommendation systems [Andersen et al ’08] Ranking [Altman, Tennenholtz ’07] Clustering [Kleinberg ’02] Map the space of functions – a “basis vector”

7
**Diversification Setup (1/2)**

Input: Candidate documents: U={u1,u2,…, un}, query q Relevance function: wq(ui) Distance function: dq(ui, uj) (symmetric, non-metric) Size k of output result set wq(u5) u5 u1 u2 u3 u4 u6 dq(,u2,u4)

8
**Diversification Setup (2/2)**

Output Diversified set S* of documents (|S*|= k) Diversification function: f : S x wq x dq R+ S* = argmax f(S) (|S|=k) u1 u4 u6 k = 3 S* = {u1,u2,u6} u2 u3 u5

9
**Axioms Scale-invariance Consistency Richness Strength Stability**

Relevance Diversity Stability Two technical properties

10
**Scale Invariance Axiom**

S* = argmaxS f(S, w(·), d(·, ·)) = argmaxS f(S, w΄(·), d΄(·, ·)) w΄(ui) = α · w(ui) d΄(ui,uj) = α · d(ui,uj) S*(3) No built-in scale for f !

11
**Consistency Axiom S* = argmaxS f(S, w(·), d(·, ·))**

w΄(ui) = w(ui) + ai for ui є S* d΄(ui,uj) = d(ui,uj) + bi for ui and/or uj є S* S*(3) Increasing relevance/ diversity doesn’t hurt!

12
**Stability Axiom S*(k) = argmaxS f(S, w(·), d(·, ·),k)**

S*(k) S*(k+1) for all k S*(3) S*(4) Output set shouldn’t oscillate (change arbitrarily) with size

13
Impossibility result Scale-invariance, Consistency, Richness, Strength of Relevance/Diversity, Stability, Two technical properties Theorem: No function f can satisfy all the axioms simultaneously. Proof via constructive argument

14
**Axiomatic characterization– Summary**

Baseline for what is possible Mathematical criteria for choosing f Modular approach: f is independent of specific wq(·) and dq(·, ·)!

15
**A Framework for Diversification**

Express diversity requirements in terms of desired properties Define objectives that satisfy these properties Develop efficient algorithms Metrics and evaluation methodologies

16
**Recall – Diversification Framework**

Input: U={u1,u2,…,un}, k, wq(·) and dq(·, ·) Some set of (top) n results Output: S* = argmaxS f(S, w(·), d(·, ·),k) Find the most diverse set of results of size k Advantages: Can integrate f with existing ranking engine Modular, plug-and-play framework

17
**Diversification objectives**

Max-sum (avg) objective: Violates stability! u1 u4 u6 k = 3 S* = {u1,u2,u6} u2 k = 4 S* = {u1,u3,u5,u6} u3 u3 u5 u5

18
**Diversification objectives**

Max-min objective: Violates consistency and stability! u1 u4 u6 k = 3 S* = {u1,u2,u6} u2 S* = {u1,u5,u6} u3 u5 u5

19
**Other Diversification objectives**

A taxonomy-based diversification objective Uses the analogy of marginal utility to determine whether to include more results from an already covered category Violates stability and one of the technical axioms

20
The Framework Express diversity requirements in terms of desired properties Define objectives that satisfy these properties Develop efficient algorithms Metrics and evaluation methodologies

21
**Algorithms for facility dispersion**

Recast as facility dispersion Max-sum (MaxSumDispersion): Max-min(MaxMinDispersion): Known approximation algorithms Lower bounds Lots of other facility dispersion objectives and algorithms

22
**Algorithm for categorical diversification**

∀c ∈ C, U (c |q) ← P (c |q) while |S| < k do for d ∈ D do g (d |q, c) ← c U (c |q)V (d |q,c) end for d∗ ← argmax g (d | q, c) S ← S ∪ {d∗} ∀c ∈ C, U (c |q) ← (1−V (d∗ |q, c))U (c |q) D ← D \ {d∗} end while P(c | q): conditional prob of intent c given query q g(d | q, c): current prob of d satisfying q, c Update the utility of a category

23
**An Example Intent distribution: P (R |q) = 0.8, P (B |q) = 0.2.**

U(R | q) = 0.08 0.8 U(B | q) = 0.07 0.2 0.12 D V(d | q, c) g(d | q, c) S Actually produces an ordered set of results Results not proportional to intent distribution Results not according to (raw) quality Better results ⇒ less needed to be shown 0.9 0.9 ×0.8 0.72 0.5 ×0.08 ×0.08 ×0.8 0.04 0.40 0.4 ×0.08 ×0.8 ×0.08 0.03 0.32 0.4 0.4 ×0.2 ×0.2 0.08 0.08 0.4 0.4 ×0.2 ×0.2 ×0.12 0.08 0.08 0.05

24
The Framework Express diversity requirements in terms of desired properties Define objectives that satisfy these properties Develop efficient algorithms Metrics and evaluation methodologies

25
**Evaluation Methodologies**

Approach Represent real queries Scale beyond a few user studies Problem: Hard to define ground truth Use disambiguated information sources on the web as the ground truth Incorporate intent into human judgments Can exploit the user distribution (need to be careful)

26
**Wikipedia Disambiguation Pages**

Query = Wikipedia disambiguation page title Large-scale ground truth set Open source Growing in size

27
**Metrics Based on Wikipedia Topics**

Novelty Coverage of wikipedia topics Relevance coverage of top Wikipedia results

28
**The Relevance and Distance Functions**

Relevance function: 1/position Can use the search engine score Maybe use query category information Distance function: Compute TF-IDF distances Jaccard similarity score for two documents A and B:

29
**Evaluating Novelty Topics/categories = list of disambiguation topics**

Given a set Sk of results: For each result, compute a distribution over topics (using our d(·, ·)) Sum confidence over all topics Threshold to get # topics represented Category confidence Jaguar cat: Jaguar car: Threshold = 1.0 Jaguar cat: 0 Jaguar car: 1 jaguar.com Jaguar cat (0.1) Jaguar car (0.9) wikipedia.org/jaguar Jaguar cat (0.8) Jaguar car (0.2)

30
Evaluating Relevance Query – get ranking of search restricted to Wikipedia pages a(i) = position of Wikipedia topic i in this list b(i) = position of Wikipedia topic i in list being tested Relevance is measured in terms of reciprocal ranks:

31
**Adding Intent to Human Judgments (Generalizing Relevance Metrics)**

Take expectation over distribution of intents Interpretation: how will the average user feel? Consider Classic: NDCG-IA depends on intent distribution and intent-specific NDCG

32
**Evaluation using Mechanical Turk**

Created two types of HITs on Mechanical Turk Query classification: workers are asked to choose among three interpretations Document rating (under the given interpretation) Two additional evaluations MT classification + current ratings MT classification + MT document ratings

33
**Some Important Questions**

When is it right to diversify? Users have certain expectations about the workings of a search engine What is the best way to diversify? Evaluate approaches beyond diversifying the retrieved results Metrics that capture both relevance and diversity Some preliminary work suggests that there will be certain trade-offs to make

34
Questions?

35
**Why frame diversification as set selection?**

Otherwise, need to encode explicit user model in the metric Selection only needs k (which is 10) Later, can rank set according to relevance Personalize based on clicks Alternative to stability: Select sets repeatedly (this loses information) Could refine selection online, based on user clicks

36
**Novelty Evaluation – Effect of Algorithms**

37
**Relevance Evaluation – Effect of Algorithms**

38
**Product Evaluation – Anecdotal Result**

Results for query cd player Relevance: popularity Distance: from product hierarchy

39
**Preliminary Results (100 queries)**

40
**Evaluation using Mechanical Turk**

41
**Other Measures of Success**

Many metrics for relevance Normalized discounted cumulative gains at k Mean average precision at k Mean reciprocal rank (MRR) Some metrics for diversity Maximal marginal relevance (MMR) [CG98] Nugget-based instantiation of NDCG [C+08] Want a metric that can take into account both relevance and diversity [JK00]

42
Problem Statement Diversify(k) Given a query q, a set of documents D, distribution P(c | q), quality estimates V(d | c, q), and integer k Find a set of docs S D with |S| = k that maximizes interpreted as the probability that the set S is relevant to the query over all possible intentions Multiple intents Find at least one relevant doc

43
**Discussion of Objective**

Makes explicit use of taxonomy In contrast, similarity-based: [CG98], [CK06], [RKJ08] Captures both diversification and doc relevance In contrast, coverage-based: [Z+05], [C+08], [V+08] Specific form of “loss minimization” [Z02], [ZL06] “Diminishing returns” for docs w/ the same intent Objective is order-independent Assumes that all users read k results May want to optimize k P(k) P(S | q)

Similar presentations

OK

Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.

Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To ensure the functioning of the site, we use **cookies**. We share information about your activities on the site with our partners and Google partners: social networks and companies engaged in advertising and web analytics. For more information, see the Privacy Policy and Google Privacy & Terms.
Your consent to our cookies if you continue to use this website.

Ads by Google

Ppt on fire extinguisher types video Ppt on condition monitoring of induction motor Ppt on cobb douglas production function Ppt on trade fair circular Ppt on careers in psychology Ppt on airbag in cars Ppt on congruent triangles for class 7 Ppt on word association test example Ppt on water cycle in hindi Ppt on bodybuilding