Presentation is loading. Please wait.

Presentation is loading. Please wait.

Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp. 87-94, 2010.

Similar presentations


Presentation on theme: "Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp. 87-94, 2010."— Presentation transcript:

1 Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp. 87-94, 2010

2 Introduction  Recommendation Systems  Suggest items based on user preferences  Recommendation Approaches:  Content-based Items are recommended based on a user profile and product information  Collaborative Filtering Use similarity to recommend items that were liked by similar users, i.e., recommendation is based on the rating history of the system Predict unknown ratings so that users can be given suggestions based on items with a high expected rating 2

3 Challenges  Existing approaches are more adequate for static settings  Incorporating new data to this models is not a trivial task  Recommendations are based on the best predicted ratings  However, predicting ratings is very computationally expensive in large datasets  Euclidean embedded (EE) method for collaborative filtering  Users and items are embedded in a unified Euclidean space  The distance between a user and an item is inversely proportional to the rating 3 Solution

4 Euclidean Embedding (EE)  Advantages of EE  Is more intuitively understandable for human allowing useful visualizations  Allows very efficient recommendation query implementation  Facilitates online implementation requirements, e.g., mapping new users/items 4

5 Related Work  Neighborhood/Memory-based CF Algorithms  Item-based or user-based  KNN associates to each user/item its set of NNs; predicts a user’s rating on an item using the ratings of its NNs  Utilize the entire DB of user preferences when computing recommendations  Model-based CF Algorithms  Matrix Factorization & Non-Negative Matrix Factorization  Compute a model of the preference data & use it to produce recommendations  Find patterns based on training on a subset of the DB 5

6 Collaborative Filtering (CF)  Given N users and M items  In a model-based approach for CF The model is trained based on known ratings (training set) so that the prediction error is minimized Root mean squared error (RMSE) is a popular error function The objective function of a model-based CF, i.e., Matrix Factorization, approach is defined as r ui the rating of user u for item i is the prediction of the model for the rating of u for i w ui is 1 if r ui is known, and 0 otherwise 6

7 CF via Matrix Factorization  CF via EE is similar to CF via matrix factorization (MF)  The predicted rating, i.e.,, via MF is computed as μ is the total average of all ratings b u is the deviation of user u from the average b i is the deviation of item i from the average p u and q i are the user-factor and item-factor vector in a D- dimensional space, respectively, and p u q i ’ is the dot product of p u & q i ’  A higher p u q i ’ means u likes i more than average 7

8 CF via Matrix Factorization  A gradient descent approach is used to solve CF problems with a highly sparse data matrix  The goal is to minimize the following objective function where avoids overfitting the magnitude parameters, and λ is an algorithmic parameter  The gradient descent updates for each known rating r ui are there are T, i.e., number of known rating steps, to go through all ratings in the training dataset 8 Current error for rating r ui Step size of the algorithm

9 CF via Euclidean Embedding  All items & users are embedded in a unified Euclidean space  The characteristics of each person/item is defined by its location  If an item is close to a user in a unified space, its characteristics are attractive for the user 9 A user is expected to like an item which is close in the space

10 CF via Euclidean Embedding  The predicted rating, i.e.,, via EE is computed as  x u and y i are point vectors of user u and item i in a D- dimensional Euclidean space  (x u - y i )(x u - y i )’ is the squared Euclidean distance The squared Euclidean distance is computationally cheaper while the accuracy remains the same 10

11 CF via Euclidean Embedding  EE is a supervised learning approach  The training phase involves finding the location of each item and user to minimize a loss function EE modifies the previous objective function (on Slide #7)  Using gradient descent to minimize the EE objective function, updates in each step are defined as 11 step size

12 CF via Euclidean Embedding  Time Complexity  Training  Prediction  Recommendation  Visualization 1. Implement CF via EE in a high-dimensional space 2. Select the top K items for an active user 3. Embed user, selected items, and some favorite items in a 2-dimensional space via multi-dimensional scaling (MDS), using distances for the high dimensional space in step 1 12 O(D), where D is the dimension of the space O(K-Nearest Neighbor) = O(N 2 )

13 CF via Euclidean Embedding  Example. Using a low-dimensional unified user-item space, it is possible to represent items to users via a graphical interface 13 Representing close items ( ) to a user ( ) besides the movies he has already liked ( ) to assist him in selection

14 The search space for a query user ( ): EE searches for the K nearest neighbors while MF explores a large space CF via Euclidean Embedding  Fast recommendation generation  Mapped space allows candidate retrieval via neighborhood search The smaller the distance, the more desirable an item will be 14

15 CF via Euclidean Embedding  Incorporating new users and items  For a new user or item, there are D + 1 unknown values D for the vector p or q and 1 for the scalar b  Active learning may be used by a recommender by asking new users to provide their favorite items Since the point vector of items in the space is known, and a new user is probably very close to his favorite items in the EE space, a user vector, x u, can be estimated as 15 Items that a new user u has selected as his favorites Number of selected items

16 Experimental Results  Datasets used  Netflix dataset consists of 17,770 movies, ~480,000 users, and ~100,000,000 ratings Dimension D = 50, regularization parameter = 0.005, & step side  = 0.005  MovieLens dataset consists of 1,682 movies, 943 users, and 100,000 ratings Dimension D = 50, regularization parameter = 0.03, & step side  = 0.005 16

17 Experimental Results  Learning curve  Test RMSE of EE & MF in each iteration of the gradient descent algorithm for five different folds  MF is more prone to overfitting, since its error increases faster after it passes the optimal point 17

18 Experimental Results  Dimension, accuracy, and time  EE & MF give similar results in 5, 25, and 50 dimensions  Precise & recall: rates 4 & 5 are considered desirable 18 EE performs better than MF

19 Experimental Results  Visualization  For a typical user, the top n movies are selected based on EE with D = n dimensions  In the picture of EE, items are embedded based on the “taste” of the active user, while in the picture of MDS, it is based on the “tastes” of all users 19

20 Experimental Results  Generating Fast Recommendations  Generating new recommendations for a user using EE can be treated as a kNN search problem in a Euclidean space  The table shows the top-10 recommendations to all users D(imension) = 50 In MF & EE, an exhaustive search was applied, whereas for EE-KNN, first 100 movies for each user were selected as candidates 20 Search time decreases significantly

21 Experimental Results  New Users  New users can be quickly mapped in the existing space  MFa & EEa implement averaging for new users, whereas EEp represents the precision/recall values for the regular settings when the users are not new 21


Download ppt "Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp. 87-94, 2010."

Similar presentations


Ads by Google