Matrix Factorization Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Matrix Factorization Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University

The $1 Million Question 2

Ratings Data 134 355 455 3 3 222 5 211 3 3 1 17,700 movies 480,000 users

Training Data  100 million ratings (matrix is 99% sparse)  Rating = [user, movie-id, time-stamp, rating value]  Generated by users between Oct 1998 and Dec 2005  Users randomly chosen among set with at least 20 ratings  Small perturbations to help with anonymity 4

Ratings Data 134 355 455 3 3 2?? ? 21? 3 ? 1 Test Data Set (most recent ratings) 480,000 users 17,700 movies

Scoring  Minimize root mean square error  Does not necessarily correlate well with user satisfaction  But is a widely-used well-understood quantitative measure  RMSE Baseline Scores on Test Data  1.054 - just predict the mean user rating for each movie  0.953 - Netflix’s own system (Cinematch) as of 2006  0.941 - nearest-neighbor method using correlation  0.857 - required 10% reduction to win $1 million 6 Mean square error = 1/|R|  (u,i) e R ( r ui - r ui ) 2 ^

Matrix Factorization of Ratings Data  Based on the idea of Latent Factor Analysis  Identify latent (unobserved) factors that “explain” observations in the data  In this case, observations are user ratings of movies  The factors may represent combinations of features or characteristics of movies and users that result in the ratings 7 R Q P m users n movies m users n movies f f ~ ~ x r ui q T i p u ~ ~

Matrix Factorization of Ratings Data 8 Figure from Koren, Bell, Volinksy, IEEE Computer, 2009

Matrix Factorization of Ratings Data 9 Credit: Alex Lin, Intelligent Mining

Predictions as Filling Missing Data Credit: Alex Lin, Intelligent Mining

Learning Factor Matrices  Need to learn the feature vectors from training data  User feature vector: (a, b, c)  Item feature vector (x, y, z)  Approach: Minimize the errors on known ratings Credit: Alex Lin, Intelligent Mining

Learning Factor Matrices min q,p     u,i)  R ( r ui - q t i p u ) 2 r ui q t i p u ~ ~ min q,p     u,i)  R ( r ui - q t i p u ) 2 + ( |q i | 2 + |p u | 2 ) Add regularization 12

Stochastic Gradient Descent (SGD)  ui = r ui - q t i p u q i  q i +   ui p u - q i ) p u  p u +   ui q i - p u ) min q,p     u,i)  R ( r ui - q t i p u ) 2 + ( |q i | 2 + |p u | 2 ) regularization goodness of fit Online (“stochastic”) gradient update equations: 13

Components of a Rating Predictor user-movie interactionmovie biasuser bias User-movie interaction  Characterizes the matching between users and movies  Attracts most research in the field  Benefits from algorithmic and mathematical innovations Baseline predictor Separates users and movies Often overlooked Benefits from insights into users’ behavior Among the main practical contributions of the competition 14 Credit: Yehuda Koren, Google, Inc.

Modeling Systematic Biases r ui  + b u + b i + user-movie interactions ~ ~ overall mean rating mean rating for user u mean rating for movie i Example: Mean rating  = 3.7 You are a critical reviewer: your ratings are 1 lower than the mean -> b u = -1 Star Wars gets a mean rating of 0.5 higher than average movie: b i = + 0.5 Predicted rating for you on Star Wars = 3.7 - 1 + 0.5 = 3.2 q t i p u 15 Credit: Padhraic Smyth, University of California, Irvine

Objective Function min q,p     u,i)  R ( r ui - (  + b u + b i + q t i p u ) ) 2 + ( |q i | 2 + |p u | 2 + |b u | 2 + |b i | 2 ) } regularization goodness of fit Typically selected via grid-search on a validation set 16 Credit: Padhraic Smyth, University of California, Irvine

5% 8% 17 Figure from Koren, Bell, Volinksy, IEEE Computer, 2009

Explanation for increase? 19

Adding Time Effects r ui  + b u + b i + user-movie interactions ~ ~ ~ ~ r ui  + b u (t) + b i (t) + user-movie interactions Add time dependence to biases Time-dependence parametrized by linear trends, binning, and other methods For details see Y. Koren, Collaborative filtering with temporal dynamics, ACM SIGKDD Conference 2009 20 Credit: Padhraic Smyth, University of California, Irvine

Adding Time Effects r ui  + b u (t) + b i (t) + q t i p u (t) ~ ~ Add time dependence to user “factor weights” Models the fact that user’s interests over “genres” (the q’s) may change over time 21

Figure from Koren, Bell, Volinksy, IEEE Computer, 2009 5% 8% 22

The Kitchen Sink Approach….  Many options for modeling  Variants of the ideas we have seen so far  Different numbers of factors  Different ways to model time  Different ways to handle implicit information  ….  Other models (not described here)  Nearest-neighbor models  Restricted Boltzmann machines  Model averaging was useful….  Linear model combining  Neural network combining  Gradient boosted decision tree combining  Note: combining weights learned on validation set (“stacking”) 23 Credit: Padhraic Smyth, University of California, Irvine

Other Aspects of Model Building  Automated parameter tuning  Using a validation set, and grid search, various parameters such as learning rates, regularization parameters, etc., can be optimized  Memory requirements  Memory: can fit within roughly 1 Gbyte of RAM  Training time  Order of days: but achievable on commodity hardware rather than a supercomputer  Some parallelization used 25 Credit: Padhraic Smyth, University of California, Irvine

Progress Prize 2008 Sept 2 nd  Only 3 teams qualify for 1% improvement over previous year Oct 2 nd  Leading team has 9.4% overall improvement Progress prize ($50,000) awarded to BellKor team of 3 AT&T researchers (same as before) plus 2 Austrian graduate students, Andreas Toscher and Martin Jahrer Key winning strategy: clever “blending” of predictions from models used by both teams Speculation that 10% would be attained by mid-2009 26

The Leading Team for the Final Prize  BellKorPragmaticChaos  BellKor:  Yehuda Koren (now Yahoo!), Bob Bell, Chris Volinsky, AT&T  BigChaos:  Michael Jahrer, Andreas Toscher, 2 grad students from Austria  Pragmatic Theory  Martin Chabert, Martin Piotte, 2 engineers from Montreal (Quebec) 27

June 26 th 2009: after 1000 days & nights… 29

Million Dollars Awarded Sept 21 st 2009 30

Matrix Factorization Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Similar presentations

Presentation on theme: "Matrix Factorization Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Matrix Factorization Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Similar presentations

Presentation on theme: "Matrix Factorization Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University."— Presentation transcript:

Similar presentations

About project

Feedback