Download presentation

Presentation is loading. Please wait.

Published byTrent Verdier Modified about 1 year ago

1
CS910: Foundations of Data Analytics Graham Cormode Recommender Systems

2
Objectives To understand the concept of recommendation To see neighbour based methods To see latent factor methods To see how recommender systems are evaluated CS910 Foundations of Data Analytics 2

3
Recommendations A modern problem: a very large number of possible items – Which item should I try next, based on my preferences? Arises in many different places: – Recommendations of content: books, music, movies, videos... – Recommendations of places to travel, hotels, restaurants – Recommendations of food to eat, sites to visit – Recommendations of articles to read: news, research, gossip Each person has different preferences/interests – How to elicit and model these preferences? – How to customize recommendations to a particular individual? CS910 Foundations of Data Analytics 3

4
Recommendations in the Wild CS910 Foundations of Data Analytics 4

5
Recommender Systems Recommender systems: produce tailored recommendations – Inputs: ratings of items by users Possibly also: user profiles, item profiles – Outputs: for a given user, output a list of recommended items Or, for a given (user, item) pair, output a predicted rating Ratings can be in many forms – “Star rating” (out of 5) – Binary rating (thumbs up/thumbs down) – Likert scale (Strongly like, like, neutral, dislike, strongly dislike) – Comparisons: prefer X to Y Will use movie recommendation as a running example CS910 Foundations of Data Analytics 5

6
Ratings Matrix Item 1Item 2Item 3Item 4Item 5... User User 22? User 35 User User 5?4... CS910 Foundations of Data Analytics 6 n m matrix of ratings R, where r u,i is rating of user u for item i Typically, matrix is large and sparse Thousands of users (n) and thousands of items (m) Each user has rated only a few items Each item is rated by at most a small fraction of users Goal is to provide predictions p u,i for certain (user, item) pairs

7
Evaluating a recommender system Evaluation is similar to evaluating classifiers – Break labeled data into training and test data – For each test (user, item) pair, predict the user score for the item – Measure the difference, and aim to minimize over N tests Combine the differences into a single score to compare systems – Most common: Root-Mean-Square-Error (RMSE) between p u,i & r u,i RMSE = √( u,i (p u,i – r u,i ) 2 / N ) – Sometimes also use Mean Absolute Error (MAE) u,i |p u,i – r u,i | / N – If recommendations are either ‘good’ or ‘bad’, can use precision, recall CS910 Foundations of Data Analytics 7

8
Initial attempts Can we use existing methods: classification, regression etc.? Assume we have features for each user and each item: – User: Demographics, stated preferences – Item: E.g. Genre, director, actors Can treat as a classification problem: predict a score – Train classifier from examples Limitations of the classifier approach: – Don’t necessarily have user and item information – Ignores what we do have: lots of ratings between users and items Hard to use as features, unless everyone has rated a fixed set CS910 Foundations of Data Analytics 8

9
Neighbourhood method Neighbourhood-based collaborative filtering – Users “collaborate” to help recommend (filter) items 1.Find k other users K who are similar to target user u – Possibly assign a weight based on how similar, w u,v 2.Combine the k users’ (weighted) preferences – Use these to make predictions for u Can use existing methods to measure similarity – PMCC to measure correlation of ratings as w u,v – Cosine similarity of vectors CS910 Foundations of Data Analytics 9

10
Neighbourhood example (unweighted) 3 users like the same set of movies as Joe (exact match) – All three like “Saving Private Ryan”, so this is top recommendation CS910 Foundations of Data Analytics 10

11
Different Rating Scales Every user rates slightly differently – Some consistently rate high, some consistently rate low Using PMCC avoids this effect when picking neighbours but needs adjustment for making predictions Make an adjustment when computing a score: – Predict: p u,i = r u + ( v K (r v,i – r v ) w u,v )/ ( v K w u,v ) – r u : average rating for user u – w u,v : weight assigned to user v based on their similarity to u E.g. The correlation coefficient value – p u,i computes the weighted deviation from v’s average score, and adds onto u’s average score CS910 Foundations of Data Analytics 11

12
Item-based Collaborative Filtering Often there are many more users than items – E.g. Only few thousand movies available, but millions of users – Comparing to all users can be slow Can do neighbourhood-based filtering using items – Two items are similar if the users rating them are similar – Compute PMCC between the users rating them both as w i,j – Find k most similar items J – Compute simple weighted average p u,i = j J r u,j w i,j / ( j J w i,j ) No adjustment by mean as we assume no bias from items CS910 Foundations of Data Analytics 12

13
Latent Factor Analysis We rejected methods based on features of items, since we could not guarantee they would be available Latent Factor Analysis tries to find “hidden” features from the rating matrix. – Factors might correspond to recognisable features like genre – Other factors: Child-friendly, comedic, light/dark – More abstract: depth of character, quirkiness – Could find factors that are hard to interpret CS910 Foundations of Data Analytics 13

14
Latent Factor Example CS910 Foundations of Data Analytics 14

15
Matrix Factorization Model each user and item as a vector of (inferred) factors – Let q i be the vector for item i, w u be the vector for user u – The predicted rating p u,i is then the dot product (w u ∙ q i ) How to learn the factors from the given data? – Given ratings matrix R, try to express R as a product WQ W is n f matrix of users and their latent factors Q is a f m matrix of items and their latent factors – A matrix factorization problem: factor R into W Q Can be solved by Singular Value Decomposition CS910 Foundations of Data Analytics 15

16
Singular Value Decomposition Given m x n matrix M, decompose into M = U V T, where: – U is a m x m matrix of orthogonal columns [left singular vectors] – is a rectangular m x n diagonal matrix [singular values] – V T is a n x n matrix of orthogonal rows [right singular vectors] The Singular Value Decomposition is highly structured – The singular values are the square roots of eigenvalues of MM T – The left (right) singular vectors are eigenvectors of MM T (M T M) SVD can be used to give approximate representations – Take the k largest singular values, set rest to zero – Picks out the k most important “directions” – Gives the k latent factors to describe the data CS910 Foundations of Data Analytics 16

17
SVD for recommender systems Textbook SVD doesn’t work when matrix has missing values! – Could try to fill in the missing values somehow, then factor Instead, set up as an optimization problem: – Learn length k vectors q i, w u to solve the following optimization: Min q, v ∑ (u,i) R (r u,i – q i w u ) 2 Minimize the squared error between the predicted and true value If we had a complete matrix, SVD would solve this problem – Set W = U k ½ k and Q = ½ k V k U k, V k are singular vectors corresponding to k largest singular values Additional problem: too much freedom (not enough ratings) – Risk of overfitting the training data, failing to generalize CS910 Foundations of Data Analytics 17

18
Regularization Regularization is a technique used in many places – Here, avoid overfitting by penalizing having too many parameters – Achieve this by adding the size of the parameters to optimization Min q, v ∑ (u,i) R (r u,i – q i w u ) 2 + (ǁq i ǁ ǁw u ǁ 2 2 ) ǁxǁ 2 2 is the L 2 (Euclidean) norm squared: sum of squared values – Effect is to set more values of q and v to 0 to minimize complexity Many different forms of regularization: – L 2 regularization: add terms of the form ǁxǁ 2 2 – L 1 regularization: terms of the form ǁxǁ 1 (can give sparser solutions) The form of the regularization should fit the optimization CS910 Foundations of Data Analytics 18

19
Solving the optimization: Gradient Descent How to solve Min q, v ∑ (u,i) R (r u,i – q i w u ) 2 + (ǁq i ǁ ǁw u ǁ 2 2 ) ? Gradient Descent – For each training example, find error of current prediction e u,i = r u,i – q i w u – Modify the parameters by taking a step in direction of the gradient q i q i + γ (e u,i w u - λ q i ) [derivative of target with respect to q] w u w u + γ (e u,i q i - λ w u ) [derivative with respect to p] – γ is parameter to control the speed of descent Advantages and disadvantages of gradient descent – ++ Fairly easy to implement: easy to compute update at each step – -- Can be slow: hard to parallelize CS910 Foundations of Data Analytics 19

20
Solving the optimization: Least Squares How to solve Min q, v ∑ (u,i) R (r u,i – q i w u ) 2 + (ǁq i ǁ ǁw u ǁ 2 2 ) ? Reducing to Least Squares – Suppose the values of w u are fixed – Then the goal is to minimize a function of the squares of q i s – Solved by techniques from regression: least squares minimization Alternating least squares – Pretend values of w u are fixed, optimize values of q i – Swap, pretend values of q i are fixed, optimize values of w u – Repeat until convergence Can be slower than gradient descent on a single machine – But can parallelize: compute each q i independently CS910 Foundations of Data Analytics 20

21
Adding biases Can generalize matrix factorization to incorporate other factors – E.g. Fred always rates 1 star less than average – E.g. Citizen Kane is rated 0.5 higher than other films on average These are not captured as well by a model of the form q i w u – Explicitly modeling biases (intercepts) can give better fit Model with biases: p u,i = + b i + b u + (w u ∙ q i ) : global average rating b i : bias for item i b u : rating bias from user u (similar to neighborhood method) Optimize the new error function in the same way: Min q,v,b ∑ (u,i) R (r u,i – – b u – b i – q i w u ) 2 + (ǁq i ǁ ǁw u ǁ b u 2 + b i 2 ) – Can add more biases e.g. incorporate variation over time CS910 Foundations of Data Analytics 21

22
Cold start problem: new items How to cope when new objects are added to the system? – New users arrive, new movies are released: “cold start” problem New item is created: no ratings, so will not be recommended? – Use attributes of the item (actors, genre) to give some score – Randomly suggest it to users to get some ratings CS910 Foundations of Data Analytics 22

23
Cold start problem: new users New users arrive: we have no idea what they like! – Recommend globally popular items to them (Harry Potter…) May not give much specific information about their tastes – Encourage new users to rate some items before recommending – Suggest items that are “divisive”: try to maximize information Tradeoff: “poor” recommendations may drive users away CS910 Foundations of Data Analytics 23

24
Case Study: The Netflix Prize Netflix ran competition from – Netflix streams movies over internet (and rents DVDs by mail) – Users rate each movie on a 5-star scale – Netflix makes recommendations of what to watch next Object of competition: improve over current recommendations – “Cinematch” algorithm: “uses straightforward linear models…” – Prize: $1M to improve RMSE by 10% Training data: 100M dated ratings from 480K users to 18K movies – Can submit ratings of test data at most once per day – Avoid stressing of servers, attempts to elicit true answers CS910 Foundations of Data Analytics 24

25
The Netflix Prize CS910 Foundations of Data Analytics 25 https://www.youtube.com/watch?v=Imp V70uLxyw

26
Netflix prize factors Postscript: Netflix adopted some ideas but not all – “Explainability” of recommendations is an additional requirement – Cost of fitting models, making predictions is also important CS910 Foundations of Data Analytics 26

27
Recommender Systems Summary Introduced the concept of recommendation Saw neighbour based methods Saw latent factor methods Understood how recommender systems are evaluated – Netflix prize as a case study in applied recommender systems Recommended reading: – Recommender systems (Encyclopedia of Machine Learning) Recommender systems – Matrix Factorization Techniques for Recommender Systems Koren, Bell, Volinsky, IEEE Software Matrix Factorization Techniques for Recommender Systems CS910 Foundations of Data Analytics 27

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google