EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.

EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 minsuk@europa.snu.ac.kr May 21 st, 2009 Nathan N. Liu & Qiang Yang SIGIR 2008 Center for E-Business Technology Seoul National University Seoul, Korea

Contents  Introduction  Related Work  Rating Oriented Collaborative Filtering  Ranking Oriented Collaborative Filtering  Experiments  Conclusions 2

Copyright  2009 by CEBT Introduction  Recommender Systems Content-based filtering Analyze content information associated with items and users E.g. product descriptions, user profiles, etc. Represent users and items using a set of features Collaborative filtering NOT require content information about items Assumption that a user is interested in items preferred by other similar users shirtcolorredblueblackbrandsize User AItem 1Item 2Item 3User BItem 1Item 2Item 3 Content-based filteringcollaborative filtering 3

Copyright  2009 by CEBT Introduction  Collaborative Filtering Application Scenario Rating prediction one individual item at a time with a predicted rating Top-N recommended items an ordered list of top-N recommended items Rating Prediction (MovieLens)Top-N List (Amazon) 4

Copyright  2009 by CEBT Introduction  Motivation In most CF, adopt rating-oriented approach predict potential ratings first, then rank them Higher accuracy in rating prediction does NOT necessarily lead to better ranking effectiveness Example Same error for two prediction algorithm, but for “predicted 2”, predicted ranking is incorrect Most existing methods predict rating without considering user’s preferences regarding pair of items 5 Item iItem jerror True rating34 Predicted 125 Predicted 243

Copyright  2009 by CEBT Introduction  Overview Ranking-oriented Approach to CF directly address item ranking problem Without inter-mediate step of rating prediction  Contribution Similarity measure for two user’s rankings Kendall rank correlation coefficient Methods for producing item rankings Greedy order algorithm, Random walk model 6 Rating prediction Rank items

Contents  Introduction  Related Work Neighborhood-based Approach Model-based Approach  Rating Oriented Collaborative Filtering  Ranking Oriented Collaborative Filtering  Experiments  Conclusions 7

Copyright  2009 by CEBT Neighborhood-based Approach  User-based Model Estimate unknown ratings of a target user based on ratings of neighboring users by using user-user similarity  Difficulties in User-based Model Raw ratings may contain biases E.g. Some tends to give high ratings. Use user-specific means User-item ratings data is sparse dimensionality reduction data-smoothing methods User uitemUser v 4Item A2 5Item B2 5Item C1 5Item D4 4Item E3 5Item F2 4.67Mean2.33 0.52Stdev1.03 8

Copyright  2009 by CEBT Neighborhood-based Approach  Item-based Model similar, but use item-item similarity Less sensitive to sparsity problem # of items < # of users Higher accuracy while allowing more efficient computations Sarwar et al., 2001 Item-based model (Amazon) 9

Copyright  2009 by CEBT Model-based Approach  Model-based Approach Use observed user-item ratings to train a compact model Rating prediction via the model instead of directly manipulating data Algorithms Clustering methods Aspect models Bayesian networks  Learning to Rank Rank items represented in some feature space Methods Try to Learn an item scoring function Learn a classifier for classifying item pairs 10

Contents  Introduction  Related Work  Rating Oriented Collaborative Filtering Similarity Measure Rating Prediction  Ranking Oriented Collaborative Filtering  Experiments  Conclusions 11

Copyright  2009 by CEBT Rating-based Similarity Measures  Pearson Correlation Coefficient Similarity between two users normalize ratings using average  Vector Similarity Another way of user-user similarity view each user as a vector cosine of the angle between two vectors Item-Item similarity Adjusted cosine similarity most effective 12

Copyright  2009 by CEBT Rating Prediction  User-based Model select a set of k most similar users compute weighted average of ratings  Item-based Model similar to user-based model Set of k items most similar to i 13

Contents  Introduction  Related Work  Rating Oriented Collaborative Filtering  Ranking Oriented Collaborative Filtering Similarity Measure – Kendall Rank Correlation Coefficient Preference Functions – Greedy Order & Random Walk Model  Experiments  Conclusions 14

Copyright  2009 by CEBT Similarity Measure  Motivation PCC and VS are rating-based measures In ranking-based, similarity is determined by users’ preferences over items. E.g. for user 1 and 2, rating values are different, but preferences are very close.  Kendall Rank Correlation Coefficient Item AItem BItem CRankingrating diff User 1234C > B > A User 2345C > B > A 15 2 2 different preference same preference

Copyright  2009 by CEBT Preference Functions  Modeling a user’s preference function Given two items i and j, which item is more preferable and how much? means item i is more preferable indicates the strength of preference Characteristics For same item : Anti-symmetric : NOT transitive : do not imply 16

Copyright  2009 by CEBT Preference Functions  Derive Preference Function Key challenge is to get preference that have NOT been rated. Use the same idea of neighborhood-based CF Find the set of neighbors of target user who have rated both items 17

Copyright  2009 by CEBT Preference Functions  Produce Ranking Given preference function, we want to get a ranking of items. Ranking that agree with pairwise preferences as much as possible  Ranking ρ : ranking of item in item set I : item i is ranked higher than j Value function How ρ is consistent with the preference function Ψ Our goal is to find that maximizes value function Optimal solution NP-Complete problem : Use Greedy algorithm 18

Copyright  2009 by CEBT Greedy Order Algorithm  Motivation Find an approximately optimal ranking  Algorithm Input : item set I, preference function Ψ Output : ranking Complexity is O(n 2 ), more than half of optimal 19 potential value higher when more items less preferred than i find highest ranked item remove highest one, then iterate

Copyright  2009 by CEBT Random Walk Model for Item Ranking  Random Walk based on User Preferences Motivation some rated i > j, others rated j > k, but only few rated all three i, j, k want to infer preference between i and k (implicit relationships) Use multi-step random walks Markov chain model Google PageRank Random walk on Web pages based on hyperlink Surfer randomly pick hyperlink Stationary distribution used to PageRank Model for item ranking Similarly, there are implicit links between two items less preferred item j link to more preferred item i transitional probability Stationary distribution used to item ranking 20 At each step the system may change its state from the current state to another state according to a probability distribution. The changes of state are called transitions … (Wikipedia) page link item preference

Copyright  2009 by CEBT Random Walk Model for Item Ranking  Random Walk based on User Preferences Transitional probability Probability of switching current item i to another item j higher for items that are more preferred than i depend on user’s preference function 21 Why exp function? non-negative

Copyright  2009 by CEBT Random Walk Model for Item Ranking  Compute the Item Rankings Think of PageRank algorithm you may know We can use matrix notations P : transition matrix entry : transition probability : probability of being at item i after t walking steps define get these probabilities using power iteration method for solving eigenvector Stationary probabilities It works? Existence and uniqueness guaranteed iff P is irreducible entries of P are all non-negative 22

Copyright  2009 by CEBT Random Walk Model for Item Ranking  Personalization Vector (teleport) To avoid the reducibility of the stochastic matrix (Brin and Page, 1998) Revised transition matrix PageRank Web surfer sometimes “teleport” to other pages. Teleport according to probability distribution defined by personalization vector v ε controls how often surfer teleport rather than following hyperlinks. Our model similar idea to define personalization vector Teleport to items with high ratings more often Unrated items have equal probabilities 23

Contents  Introduction  Related Work  Rating Oriented Collaborative Filtering  Ranking Oriented Collaborative Filtering  Experiments  Conclusions 24

Copyright  2009 by CEBT Experiments  Issues 1.Is ranking-oriented approach better than rating-oriented? 2.Which is better, greedy order algorithm and random walk model? 3.Is the ranking-oriented similarity measure (Kendall’s) more effective? 25 Pearson’s / Vector Similarity Kendall’s rank Similarity RatingUser / Item Ranking Greedy Random Walk 1 2 3

Copyright  2009 by CEBT Experiments  Data Sets Two Movie ratings data sets EachMovie and Netflix Users rate >40 different movies 10,000 for training 100 for parameter tuning 500 for testing  Evaluation Protocol For each user in the test set, 50% for model construction 50% for hold-out data for evaluation 26 EachMovieNetflix # of ratings2.8 M → ?100 M → ? # of users72,000 → 10,600 480,000 → 10,600 # of movies1.62818,000 → 2.000 Rating scale1 to 61 to 5 density6.1 %6.6 %

Copyright  2009 by CEBT Evaluation Metric  Which metric to use? Rating-oriented CF MAE (Mean Absolute Error) and RMSE (Root Mean Square Error) Focus on difference between true rating and predicted rating Ranking-oriented CF Our emphasis is on improving item rankings.  NDCG (Normalized Discounted Cumulative Gain) Evaluate over the top-k items on ranked list 27 discounting factor Increase with position in ranking

Copyright  2009 by CEBT Impact of Parameters  Impact of Neighborhood Size size of neighborhood affect performance Result When neighbor size ↑, NDCG ↑ until 100 because given more neighbors, preference function more accurate But, start to decrease when exceed 100, due to many non-similar users 28

Copyright  2009 by CEBT Impact of Parameters  Impact of ε How often “teleport” operation affect performance? Result When ε ↑, NDCG ↑ But, NOT too big (0.8~0.9) 29

Copyright  2009 by CEBT Comparisons with Other Algorithms 30  Issues 1.Is ranking-oriented approach better than rating-oriented? 2.Which is better, greedy order algorithm and random walk model? 3.Is the ranking-oriented similarity measure (Kendall’s) more effective?  Comparison 4 rating oriented settings, 6 ranking oriented settings PCCVSKRCC Rating UserUPCCUVS ItemIPCCIVS Ranking GreedyGOPCCGOVSGOKRCC Random WalkRWPCCRWVSRWKRCC

Copyright  2009 by CEBT Comparisons with Other Algorithms  Result Ranking-oriented is better than rating-oriented about 8.8% for NDCG1 Random walk model outperformed all the rating-oriented Random walk model is little better than greedy order Kendall rank correlation coefficient is more effective for ranking-oriented 31

Copyright  2009 by CEBT Kendall rank corr. coeff. Conclusion  Ranking-oriented Framework for CF Item ranking w/o rating prediction as intermediate step Extend neighborhood-based CF by identifying preferences Two methods for computing item ranking Greedy order algorithm Random walk model 32 Similarity measure Preference function Greedy order Random walk model

33 Clustering the Tagged Web  Thank you~

EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.

Similar presentations

Presentation on theme: "EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.

Similar presentations

Presentation on theme: "EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan."— Presentation transcript:

Similar presentations

About project

Feedback