Download presentation
Presentation is loading. Please wait.
1
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking 049011 - Algorithms for Large Data Sets Student Symposium Speaker: Li-Tal Mashiach
2
©Li-Tal Mashiach, Technion, 2006 2 References Learning to Rank Using Gradient Descent Learning to Rank Using Gradient Descent ICML, 2005, Burges et al Beyond PageRank: Machine Learning for Static Ranking Beyond PageRank: Machine Learning for Static Ranking WWW 2006, Brill et al
3
©Li-Tal Mashiach, Technion, 2006 3 Today’s topics Motivation & Introduction RankNet fRank Discussion Future Work suggestion Predict Popularity Rank (PP-Rank)
4
©Li-Tal Mashiach, Technion, 2006 4 Motivation The Web is growing exponentially in size The number of incorrect, spamming, and malicious sites is also growing static ranking Having a good static ranking is crucially important PageRank Recent works showed that PageRank may not perform any better than other simple measure on certain tasks
5
©Li-Tal Mashiach, Technion, 2006 5 Motivation – Cont. Combination of many features is more accurate than one feature PageRank is only link structure feature It is harder for malicious users to manipulate the ranking in case of machine learning approach
6
©Li-Tal Mashiach, Technion, 2006 6 Introduction Neural networks Training Cost function Gradient Descent
7
©Li-Tal Mashiach, Technion, 2006 7 Neural Networks Like the brain, neural network is a massively parallel collection of small and simple processing units where the interconnections form a large part of the network's intelligence.
8
©Li-Tal Mashiach, Technion, 2006 8 Training neural network The task is similar to teaching a student First, show him some examples After that, ask him to solve some problems Finally, correct him, and start the whole process again Hopefully, he’ll get it right after a couple of rounds
9
©Li-Tal Mashiach, Technion, 2006 9 Training neural network – cont. Cost function Cost function – Error function to minimize Sum squared error Cross entropy Gradient Descent take the derivative of the cost function with respect to the network parameters change those parameters in a gradient- related direction
10
©Li-Tal Mashiach, Technion, 2006 10 Static ranking as a Classification problem x i represents a set of features of a Web page i y i is a rank The classification problem The classification problem - learn the function that maps all pages’ features to their rank But all we really care about is the order of the pages
11
©Li-Tal Mashiach, Technion, 2006 11 RankNet order of objects values assigned to them Optimize the order of objects, rather than the values assigned to them RankNet is given Collection of pairs of items Z={ } Target probabilities that Web page i is to be ranked higher than j RankNet learns the order of the items Using probabilistic cost function (cross entropy) for training
12
©Li-Tal Mashiach, Technion, 2006 12 fRank Uses RankNet to learn the static ranking function Training according to human judgments query For each query, rating is assigned manually to a number of results The rating measures how relevant the result is for the query
13
©Li-Tal Mashiach, Technion, 2006 13 fRank – Cont. Uses set of features from each page: PageRank PageRank Popularity Popularity – number of visits Anchor text and inlinks Anchor text and inlinks – total amount of text in links, number of unique words, etc. Page Page – number of words, frequency of the most common term, etc. Domain Domain – various averages across all pages in the domain – PageRank, number of outlinks, etc.
14
©Li-Tal Mashiach, Technion, 2006 14 fRank Results fRank performs significantly better than PageRank Page and Popularity feature sets were the most significant contributors By collecting more popularity data, fRank performance continues to improve
15
©Li-Tal Mashiach, Technion, 2006 15 Discussion The training for static ranking cannot be depend on queries Using human judgments for static ranking (?) PageRank advantages protecting from spams fRank is not useful for directing the crawl
16
©Li-Tal Mashiach, Technion, 2006 16 Future work – PP-Rank Training the machine to predict popularity of Web Page Using popularity data for training Amount of visits Amount of visits How long users stay in the page How long users stay in the page Did they leave by clicking back Did they leave by clicking back … should be normalized to the pattern of each user
17
©Li-Tal Mashiach, Technion, 2006 17 PP-Rank - Advantages Can predict popularity of pages that were just created (no page points to them yet) Can be a measure for directing the crawler The rank will be not according to what web masters find interesting (PageRank), but according to what users find interesting
18
©Li-Tal Mashiach, Technion, 2006 18 Summary Ranking is the key to search engine Learning-based approach for static ranking is a promising new field RankNet fRank PP-Rank
19
©Li-Tal Mashiach, Technion, 2006 19 ANY QUESTIONS?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.