Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

ICML 2009 Yisong Yue Thorsten Joachims Cornell University

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.

Evaluating Search Engine

Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.

Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University

Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.

Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.

Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang

Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:

TransRank: A Novel Algorithm for Transfer of Rank Learning Depin Chen, Jun Yan, Gang Wang et al. University of Science and Technology of China, USTC Machine.

Learning to Rank – Theory and 夏粉 _ 百度自动化所 1.

Learning to Rank for Information Retrieval

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Introduction to Machine Learning for Information Retrieval Xiaolong Wang.

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.

Evaluating Search Engines in chapter 8 of the book Search Engines Information Retrieval in Practice Hongfei Yan.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Universit at Dortmund, LS VIII

윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Online Learning for Collaborative Filtering

1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Collecting High Quality Overlapping Labels at Low Cost Grace Hui Yang Language Technologies Institute Carnegie Mellon University Anton Mityagin Krysta.

Learning to Rank From Pairwise Approach to Listwise Approach.

Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.

Post-Ranking query suggestion by diversifying search Chao Wang.

Learning to Rank from heuristics to theoretic approaches Hongning Wang.

Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.

Is Top-k Sufficient for Ranking? Yanyan Lan, Shuzi Niu, Jiafeng Guo, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.

1 Learning to Rank --A Brief Review Yunpeng Xu. 2 Ranking and sorting Rank: only has K structured categories Sorting: each sample has a distinct rank.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

NTU & MSRA Ming-Feng Tsai

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10 Evaluation.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008 Annotations by Michael L. Nelson.

Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta

Ranking and Learning 293S UCSB, Tao Yang, 2017

Ranking and Learning 290N UCSB, Tao Yang, 2014

Evaluation of IR Systems

An Empirical Study of Learning to Rank for Entity Search

Learning to Rank from heuristics to theoretic approaches

Learning to Rank Shubhra kanti karmaker (Santu)

Eugene Agichtein Mathematics & Computer Science Emory University

Feature Selection for Ranking

Machine learning overview

Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007

Learning to Rank with Ties

Learning to Rank from heuristics to theoretic approaches

Presentation transcript:

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date: 2009/12/09 Published in ICML 2007

2 Contents  Introduction  Pairwise approach  Listwise approach  Probability models Permutation probability Top k probability  Learning method: ListNet  Experiments  Conclusions and future work

3 Introduction  Learning to rank: Ranking objects for some queries Document retrieval, expert finding, anti web spam, and product ratings, etc.  Learning to rank methods: Pointwise approach Pairwise approach Listwise approach

4 Pairwise approach (1/2)  Training samples: document pairs  Learning task: classification of object pairs into 2 categories (correctly ranked or incorrectly ranked)  Methods: RankSVM (Herbrich et al., 1999) RankBoost (Freund et al., 1998) RankNet (Burges et al., 2005)

5 Pairwise approach (2/2)  Advantages: Handiness of applying existing classification methods Ease of obtaining training instances of document pairs  E.g. click-through data from users (Joachimes, 2002)  Problems … Learning objective is to minimize errors in classifying document pairs, not to minimize errors in ranking documents. The assumption of i.i.d. generated document pairs is too strong The number of document pairs varies largely from query to query, resulting biased models towards queries with more document pairs

6 Listwise approach (1/2)  Training samples: document lists  Listwise loss function Represents the difference between the ranking list outputted by the ranking model and the ground truth ranking list Probabilistic methods + cross-entropy  Permutation probability  Top k probability  Classification model: neural network  Optimization algorithm: gradient descent

7 Listwise approach (2/2)  Listwise framework: Queries …… Documents Relevance scores Feature vectors Model-generated scores …… Listwise loss function

8 Probability models  Map a list of scores to a probability distribution Permutation probability Top k probability  Take any metric between probability distributions as a loss function Cross-entropy

9 Permutation probability (1/6)  objects are to be ranked  A permutation = ranking order of objects =  = a set of all possible permutations of objects  A list of scores

10 Permutation probability (2/6)  Permutation probability is defined as: where = an increasing and strictly positive function = the score of object at position of permutation  For example:

11 Permutation probability (3/6)  The permutation probability forms a probability distribution over and  The permutation with larger element in the front has higher probability  If has the highest probability has the lowest probability

12 Permutation probability (4/6)  Example: 3 objects with scores 3, 5, 10 PermutationProbability (%) 3, 2, , 3, , 1, , 3, , 1, , 2, Sum:100.00

13 Permutation probability (5/6)  For a linear function, the permutation probability is scale invariant where  For a exponential function, the permutation probability is translation invariant where

14 Permutation probability (6/6)  However …  The number of permutation computation is of an order of The computation is intractable for large  Consider the top k probability!

15 Top k probability (1/4)  The probability of objects (out of objects) being ranked on the top positions  The top k subgroup is defined as a set containing all the permutations in which the top k objects are exactly is the collection of all the top k subgroups now has only elements << E.g. for 5 objects, the top 2 subgroup includes:{(1,3,2,4,5), (1,3,2,5,4), (1,3,4,2,5), (1,3,4,5,2), (1,3,5,2,4), (1,3,5,4,2)}

16 Top k probability (2/4)  The top k probability of objects is defined as:  For example (5 objects):  Still needs to compute n! permutations?

17 Top k probability (3/4)  The top k probability can be computed as follows: where = the score of object (ranked at position )  For example (1,3,x,x,x):

18 Top k probability (4/4)  Top k probabilities form a probability distribution over the collection  The top k subgroup with larger element in the front has higher top k probability  Top k probability is scale or translation invariant with a carefully designed function

19 Listwise loss function  Cross-entropy between the top k distributions of two lists of scores: where denotes the query denotes the ground truth list of scores denotes the model-generated list of scores

20 Learning method: ListNet (1/2)  A learning to rank method for optimizing the listwise loss function based on top k probability with neural network as the model and gradient descent as optimization algorithm  denotes the ranking function based on the neural network model For a given feature vector, the ranking function gives a score Score list

21 Learning method: ListNet (2/2)  Learning algorithm of ListNet: Input: training data Parameter: number of iteration and learning rate Initialize parameter for t = 1 to do for = 1 to do Input of query to neural network and compute score list with current Compute gradient Update end for Output neural network model

22 Experiments  ListNet compared with 3 pairwise methods: RankNet RankSVM RankBoost  3 datasets TREC OHSUMED CSearch

23 TREC dataset .gov domain web pages in 2002  1,053,110 pages, 11,164,829 hyperlinks  50 queries  Binary relevance judgment (relevant or irrelevant)  20 features extracted from each query-document pair (e.g. content features and hyperlink features)

24 OHSUMED dataset  A collection of documents and queries on medicine  348,566 documents, 106 queries 16,140 query-document pairs  Relevance judgment: definitely relevant, possibly relevant, not relevant  30 features extracted for each query- document pair

25 CSearch dataset  A dataset from a commercial search engine  About 25,000 queries with 1000 documents associated with each query  About a total of 600 features, including query-dependent and – independent features  5 levels of relevance judgment: 4 (perfect match) to 0 (bad match)

26 Ranking performance measure (1/2)  Normalized Discounted Cumulative Gain (NDCG) where or  Can be used with more than 2 levels of relevance score

27 Ranking performance measure (2/2)  Mean Average Precision (MAP) where  MAP = average of AP over all queries  Can only use binary relevance judgment

28 Experimental results (1/4)  Ranking accuracies in terms of NDCG on TREC top k NDCG

29 Experimental results (2/4)  Ranking accuracies in terms of NDCG on OHSUMED top k NDCG

30 Experimental results (3/4)  Ranking accuracies in terms of NDCG on CSearch top k NDCG

31 Experimental results (4/4)  Ranking accuracies in terms of MAP

32 Discussions (1/2)  For pairwise approach, the number of document pairs varies largely from query to query  Distribution of the number of document pairs per query in OHSUMED

33 Discussions (2/2)  Pairwise approach employs a “ pairwise ” loss function, not suited for NCDG and MAP for performance measuring  Listwise approach better represents the performance measures  Verification? Observe the relationship between loss and NDCG in each iteration

34 Pairwise loss vs. NDCG in RankNet iteration NDCG Loss Pairwise loss NDCG

35 Listwise loss vs. NDCG in ListNet iteration NDCG Loss Pairwise loss NDCG

36 Conclusions and future work  Conclusions Listwise approach for learning to rank Permutation probability and top k probability Cross-entropy as loss function Using neural network as model and gradient descent as the optimization algorithm  Future work Use other metrics for loss function Use other models Investigate the relationship between listwise loss functions and performance measures