Presentation is loading. Please wait.

Presentation is loading. Please wait.

Personalized Search Result Diversification via Structured Learning

Similar presentations


Presentation on theme: "Personalized Search Result Diversification via Structured Learning"— Presentation transcript:

1 Personalized Search Result Diversification via Structured Learning
Shangsong Liang, Zhaochun Ren, Maarten de Rijke University of Amsterdam Presented by Yu Hu

2 Tackling Ambiguous Query
Personalization approach: Tailor the results to the specific interests of the user Inaccurate user profile When query is unrelated to the personalized information Diversification Approach: Maximize probability of showing an interpretation relevant to the user Outliers

3 Diversification Query: Queen ? Diversified Results

4 Personalization Query: Queen Personalized ordering User Profile

5 Overview of PSVMdiv Given a user and a query, predict a diverse set of docs Formulate a discriminant based on maximizing search result diversification Perform training using the structured support vector machines framework User interest LDA-style topic model Infer a per-document per-user multinomial distribution over topics and determine whether a document can cater to a specific user During Training use features extracted from three sources

6 The Learning Problem Given a user and a set of documents, select a subset of documents that maximizes search result diversification for the user y: candidate documents x : a set of documents u: documents user u is interested in Loss Function:

7 The Learning Problem Learn a hypothesis function to predict a y given x and u; Labeled training data assumed to be available: To find a function h such that the empirical risk can be minimized; Let a discriminant compute how well the predicting y fits x and u. The hypothesis predicts the y that maximizes F: Each (x, u, y) is described through a feature vector The discriminant function is assumed to be linear in the feature space:

8 Standard SVMs and Additional Constraints
Optimization problem for standard SVMs Additional constraints: For diversity: For consistency with user’s interest:

9 User Interest Topic Model
To capture per-user and per-document distributions over topics

10 Latent Dirichelet Allocation
α is the Dirichlet prior on the per-document topic distributions, β is the Dirichlet prior on the per-topic word distribution, θi is the topic distribution for document i, ϕk is the word distribution for topic k, Z is the topic for the j th word in document i, and wij is the specific word.

11 Feature Space Three types:
Extracted directly from tokens’ statistical information in the documents Compute similarity scores between a document x ϵ y and a set of documents u that a user is interested in. Cosine, Euclidean, KL divergence metrics are considered. Those generated from proposed user- interest LDA-style topic model Compute similarity scores between a document x ϵ y and a set of documents u based on a multinomial distribution over topics and the user’s multinomial distribution over topics generated by the User Interest Topic Model. Cosine, Euclidean, KL divergence metrics are considered. Those utilized by unsupervised personalized diversification algorithms The main probability used in state-of-art unsupervised personalized diversification methods are utilized here as features. Such as p(d|q), the probability of d relevant to q; p(c|d), the probability of d belonging to a category c, etc.

12 Dataset A publicly available personalized diversification dataset.
Contains private evaluation information from 35 users on 180 search queries Ambiguous queries, length no more than two keywords 751 subtopics for the queries, with most of the queries having more than 2 subtopics Over 3800 relevance judgments are available, for at least top 5 results for each query Each relevance judgment includes 3 main assessments 4-grade scale assessment on how relevant the result is to the user’s interest—user relevance 4-grade scale assessment on how relevant the result is to the evaluated query—topic relevance 2-grade assessment whether a subtopic is related to the evaluated query

13 Baselines PSVMdiv compared to 11 baselines: Traditional: BM25
Plain diversity: IA-select, xQuAD Plain personalization: PersBM25 Two step, first div, then pers: xQuADBM25 Pers-diversification: PIA-select, PIA-select BM25 , PxQuAD, PxQuAD BM25 Supervised diversification: SVMdiv, SVMrank

14 Results & Analysis -Supervised v. Unsupervised

15 Results & Analysis- Effect of UIT Model

16 Results & Analysis-Effects of Constraints

17 Query-Level Analysis

18 Conclusion Pro: User Interest Topic Model Con:
Evaluated on a single, small dataset

19 Thank you! Questions?


Download ppt "Personalized Search Result Diversification via Structured Learning"

Similar presentations


Ads by Google