Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Similar presentations


Presentation on theme: "Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:"— Presentation transcript:

1 Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date: 2009/12/09 Published in ICML 2007

2 2 Contents  Introduction  Pairwise approach  Listwise approach  Probability models Permutation probability Top k probability  Learning method: ListNet  Experiments  Conclusions and future work

3 3 Introduction  Learning to rank: Ranking objects for some queries Document retrieval, expert finding, anti web spam, and product ratings, etc.  Learning to rank methods: Pointwise approach Pairwise approach Listwise approach

4 4 Pairwise approach (1/2)  Training samples: document pairs  Learning task: classification of object pairs into 2 categories (correctly ranked or incorrectly ranked)  Methods: RankSVM (Herbrich et al., 1999) RankBoost (Freund et al., 1998) RankNet (Burges et al., 2005)

5 5 Pairwise approach (2/2)  Advantages: Handiness of applying existing classification methods Ease of obtaining training instances of document pairs  E.g. click-through data from users (Joachimes, 2002)  Problems … Learning objective is to minimize errors in classifying document pairs, not to minimize errors in ranking documents. The assumption of i.i.d. generated document pairs is too strong The number of document pairs varies largely from query to query, resulting biased models towards queries with more document pairs

6 6 Listwise approach (1/2)  Training samples: document lists  Listwise loss function Represents the difference between the ranking list outputted by the ranking model and the ground truth ranking list Probabilistic methods + cross-entropy  Permutation probability  Top k probability  Classification model: neural network  Optimization algorithm: gradient descent

7 7 Listwise approach (2/2)  Listwise framework: Queries …… Documents Relevance scores Feature vectors Model-generated scores …… Listwise loss function

8 8 Probability models  Map a list of scores to a probability distribution Permutation probability Top k probability  Take any metric between probability distributions as a loss function Cross-entropy

9 9 Permutation probability (1/6)  objects are to be ranked  A permutation = ranking order of objects =  = a set of all possible permutations of objects  A list of scores

10 10 Permutation probability (2/6)  Permutation probability is defined as: where = an increasing and strictly positive function = the score of object at position of permutation  For example:

11 11 Permutation probability (3/6)  The permutation probability forms a probability distribution over and  The permutation with larger element in the front has higher probability  If has the highest probability has the lowest probability

12 12 Permutation probability (4/6)  Example: 3 objects with scores 3, 5, 10 PermutationProbability (%) 3, 2, 134.72 2, 3, 121.37 3, 1, 220.83 1, 3, 211.11 2, 1, 36.41 1, 2, 35.56 Sum:100.00

13 13 Permutation probability (5/6)  For a linear function, the permutation probability is scale invariant where  For a exponential function, the permutation probability is translation invariant where

14 14 Permutation probability (6/6)  However …  The number of permutation computation is of an order of The computation is intractable for large  Consider the top k probability!

15 15 Top k probability (1/4)  The probability of objects (out of objects) being ranked on the top positions  The top k subgroup is defined as a set containing all the permutations in which the top k objects are exactly is the collection of all the top k subgroups now has only elements << E.g. for 5 objects, the top 2 subgroup includes:{(1,3,2,4,5), (1,3,2,5,4), (1,3,4,2,5), (1,3,4,5,2), (1,3,5,2,4), (1,3,5,4,2)}

16 16 Top k probability (2/4)  The top k probability of objects is defined as:  For example (5 objects):  Still needs to compute n! permutations?

17 17 Top k probability (3/4)  The top k probability can be computed as follows: where = the score of object (ranked at position )  For example (1,3,x,x,x):

18 18 Top k probability (4/4)  Top k probabilities form a probability distribution over the collection  The top k subgroup with larger element in the front has higher top k probability  Top k probability is scale or translation invariant with a carefully designed function

19 19 Listwise loss function  Cross-entropy between the top k distributions of two lists of scores: where denotes the query denotes the ground truth list of scores denotes the model-generated list of scores

20 20 Learning method: ListNet (1/2)  A learning to rank method for optimizing the listwise loss function based on top k probability with neural network as the model and gradient descent as optimization algorithm  denotes the ranking function based on the neural network model For a given feature vector, the ranking function gives a score Score list

21 21 Learning method: ListNet (2/2)  Learning algorithm of ListNet: Input: training data Parameter: number of iteration and learning rate Initialize parameter for t = 1 to do for = 1 to do Input of query to neural network and compute score list with current Compute gradient Update end for Output neural network model

22 22 Experiments  ListNet compared with 3 pairwise methods: RankNet RankSVM RankBoost  3 datasets TREC OHSUMED CSearch

23 23 TREC dataset .gov domain web pages in 2002  1,053,110 pages, 11,164,829 hyperlinks  50 queries  Binary relevance judgment (relevant or irrelevant)  20 features extracted from each query-document pair (e.g. content features and hyperlink features)

24 24 OHSUMED dataset  A collection of documents and queries on medicine  348,566 documents, 106 queries 16,140 query-document pairs  Relevance judgment: definitely relevant, possibly relevant, not relevant  30 features extracted for each query- document pair

25 25 CSearch dataset  A dataset from a commercial search engine  About 25,000 queries with 1000 documents associated with each query  About a total of 600 features, including query-dependent and – independent features  5 levels of relevance judgment: 4 (perfect match) to 0 (bad match)

26 26 Ranking performance measure (1/2)  Normalized Discounted Cumulative Gain (NDCG) where or  Can be used with more than 2 levels of relevance score

27 27 Ranking performance measure (2/2)  Mean Average Precision (MAP) where  MAP = average of AP over all queries  Can only use binary relevance judgment

28 28 Experimental results (1/4)  Ranking accuracies in terms of NDCG on TREC top k NDCG

29 29 Experimental results (2/4)  Ranking accuracies in terms of NDCG on OHSUMED top k NDCG

30 30 Experimental results (3/4)  Ranking accuracies in terms of NDCG on CSearch top k NDCG

31 31 Experimental results (4/4)  Ranking accuracies in terms of MAP

32 32 Discussions (1/2)  For pairwise approach, the number of document pairs varies largely from query to query  Distribution of the number of document pairs per query in OHSUMED

33 33 Discussions (2/2)  Pairwise approach employs a “ pairwise ” loss function, not suited for NCDG and MAP for performance measuring  Listwise approach better represents the performance measures  Verification? Observe the relationship between loss and NDCG in each iteration

34 34 Pairwise loss vs. NDCG in RankNet iteration NDCG Loss Pairwise loss NDCG

35 35 Listwise loss vs. NDCG in ListNet iteration NDCG Loss Pairwise loss NDCG

36 36 Conclusions and future work  Conclusions Listwise approach for learning to rank Permutation probability and top k probability Cross-entropy as loss function Using neural network as model and gradient descent as the optimization algorithm  Future work Use other metrics for loss function Use other models Investigate the relationship between listwise loss functions and performance measures


Download ppt "Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:"

Similar presentations


Ads by Google