Download presentation
Presentation is loading. Please wait.
Published byStephany Wilkinson Modified over 8 years ago
1
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date: 2009/12/09 Published in ICML 2007
2
2 Contents Introduction Pairwise approach Listwise approach Probability models Permutation probability Top k probability Learning method: ListNet Experiments Conclusions and future work
3
3 Introduction Learning to rank: Ranking objects for some queries Document retrieval, expert finding, anti web spam, and product ratings, etc. Learning to rank methods: Pointwise approach Pairwise approach Listwise approach
4
4 Pairwise approach (1/2) Training samples: document pairs Learning task: classification of object pairs into 2 categories (correctly ranked or incorrectly ranked) Methods: RankSVM (Herbrich et al., 1999) RankBoost (Freund et al., 1998) RankNet (Burges et al., 2005)
5
5 Pairwise approach (2/2) Advantages: Handiness of applying existing classification methods Ease of obtaining training instances of document pairs E.g. click-through data from users (Joachimes, 2002) Problems … Learning objective is to minimize errors in classifying document pairs, not to minimize errors in ranking documents. The assumption of i.i.d. generated document pairs is too strong The number of document pairs varies largely from query to query, resulting biased models towards queries with more document pairs
6
6 Listwise approach (1/2) Training samples: document lists Listwise loss function Represents the difference between the ranking list outputted by the ranking model and the ground truth ranking list Probabilistic methods + cross-entropy Permutation probability Top k probability Classification model: neural network Optimization algorithm: gradient descent
7
7 Listwise approach (2/2) Listwise framework: Queries …… Documents Relevance scores Feature vectors Model-generated scores …… Listwise loss function
8
8 Probability models Map a list of scores to a probability distribution Permutation probability Top k probability Take any metric between probability distributions as a loss function Cross-entropy
9
9 Permutation probability (1/6) objects are to be ranked A permutation = ranking order of objects = = a set of all possible permutations of objects A list of scores
10
10 Permutation probability (2/6) Permutation probability is defined as: where = an increasing and strictly positive function = the score of object at position of permutation For example:
11
11 Permutation probability (3/6) The permutation probability forms a probability distribution over and The permutation with larger element in the front has higher probability If has the highest probability has the lowest probability
12
12 Permutation probability (4/6) Example: 3 objects with scores 3, 5, 10 PermutationProbability (%) 3, 2, 134.72 2, 3, 121.37 3, 1, 220.83 1, 3, 211.11 2, 1, 36.41 1, 2, 35.56 Sum:100.00
13
13 Permutation probability (5/6) For a linear function, the permutation probability is scale invariant where For a exponential function, the permutation probability is translation invariant where
14
14 Permutation probability (6/6) However … The number of permutation computation is of an order of The computation is intractable for large Consider the top k probability!
15
15 Top k probability (1/4) The probability of objects (out of objects) being ranked on the top positions The top k subgroup is defined as a set containing all the permutations in which the top k objects are exactly is the collection of all the top k subgroups now has only elements << E.g. for 5 objects, the top 2 subgroup includes:{(1,3,2,4,5), (1,3,2,5,4), (1,3,4,2,5), (1,3,4,5,2), (1,3,5,2,4), (1,3,5,4,2)}
16
16 Top k probability (2/4) The top k probability of objects is defined as: For example (5 objects): Still needs to compute n! permutations?
17
17 Top k probability (3/4) The top k probability can be computed as follows: where = the score of object (ranked at position ) For example (1,3,x,x,x):
18
18 Top k probability (4/4) Top k probabilities form a probability distribution over the collection The top k subgroup with larger element in the front has higher top k probability Top k probability is scale or translation invariant with a carefully designed function
19
19 Listwise loss function Cross-entropy between the top k distributions of two lists of scores: where denotes the query denotes the ground truth list of scores denotes the model-generated list of scores
20
20 Learning method: ListNet (1/2) A learning to rank method for optimizing the listwise loss function based on top k probability with neural network as the model and gradient descent as optimization algorithm denotes the ranking function based on the neural network model For a given feature vector, the ranking function gives a score Score list
21
21 Learning method: ListNet (2/2) Learning algorithm of ListNet: Input: training data Parameter: number of iteration and learning rate Initialize parameter for t = 1 to do for = 1 to do Input of query to neural network and compute score list with current Compute gradient Update end for Output neural network model
22
22 Experiments ListNet compared with 3 pairwise methods: RankNet RankSVM RankBoost 3 datasets TREC OHSUMED CSearch
23
23 TREC dataset .gov domain web pages in 2002 1,053,110 pages, 11,164,829 hyperlinks 50 queries Binary relevance judgment (relevant or irrelevant) 20 features extracted from each query-document pair (e.g. content features and hyperlink features)
24
24 OHSUMED dataset A collection of documents and queries on medicine 348,566 documents, 106 queries 16,140 query-document pairs Relevance judgment: definitely relevant, possibly relevant, not relevant 30 features extracted for each query- document pair
25
25 CSearch dataset A dataset from a commercial search engine About 25,000 queries with 1000 documents associated with each query About a total of 600 features, including query-dependent and – independent features 5 levels of relevance judgment: 4 (perfect match) to 0 (bad match)
26
26 Ranking performance measure (1/2) Normalized Discounted Cumulative Gain (NDCG) where or Can be used with more than 2 levels of relevance score
27
27 Ranking performance measure (2/2) Mean Average Precision (MAP) where MAP = average of AP over all queries Can only use binary relevance judgment
28
28 Experimental results (1/4) Ranking accuracies in terms of NDCG on TREC top k NDCG
29
29 Experimental results (2/4) Ranking accuracies in terms of NDCG on OHSUMED top k NDCG
30
30 Experimental results (3/4) Ranking accuracies in terms of NDCG on CSearch top k NDCG
31
31 Experimental results (4/4) Ranking accuracies in terms of MAP
32
32 Discussions (1/2) For pairwise approach, the number of document pairs varies largely from query to query Distribution of the number of document pairs per query in OHSUMED
33
33 Discussions (2/2) Pairwise approach employs a “ pairwise ” loss function, not suited for NCDG and MAP for performance measuring Listwise approach better represents the performance measures Verification? Observe the relationship between loss and NDCG in each iteration
34
34 Pairwise loss vs. NDCG in RankNet iteration NDCG Loss Pairwise loss NDCG
35
35 Listwise loss vs. NDCG in ListNet iteration NDCG Loss Pairwise loss NDCG
36
36 Conclusions and future work Conclusions Listwise approach for learning to rank Permutation probability and top k probability Cross-entropy as loss function Using neural network as model and gradient descent as the optimization algorithm Future work Use other metrics for loss function Use other models Investigate the relationship between listwise loss functions and performance measures
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.