Netflix Prize Solution: A Matrix Factorization Approach

Slides:



Advertisements
Similar presentations
Slide 1 FastFacts Feature Presentation June 14, 2011 We are using audio during this session, so please dial in to our conference line… Phone number:
Advertisements

- A Powerful Computing Technology Department of Computer Science Wayne State University 1.
1) New Paths to New Machine Learning Science 2) How an Unruly Mob Almost Stole the Grand Prize at the Last Moment Jeff Howbert February 6, 2012.
Eigen Decomposition and Singular Value Decomposition
Traditional IR models Jian-Yun Nie.
Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.
Online Recommendations
1 General Iteration Algorithms by Luyang Fu, Ph. D., State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting LLP 2007 CAS.
CS910: Foundations of Data Analytics Graham Cormode Recommender Systems.
Eigen Decomposition and Singular Value Decomposition
Item Based Collaborative Filtering Recommendation Algorithms
Prediction Modeling for Personalization & Recommender Systems Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.
Dimensionality Reduction PCA -- SVD
Sean Blong Presents: 1. What are they…?  “[…] specific type of information filtering (IF) technique that attempts to present information items (movies,
G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit
Intro to RecSys and CCF Brian Ackerman 1. Roadmap Introduction to Recommender Systems & Collaborative Filtering Collaborative Competitive Filtering 2.
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
I NCREMENTAL S INGULAR V ALUE D ECOMPOSITION A LGORITHMS FOR H IGHLY S CALABLE R ECOMMENDER S YSTEMS (S ARWAR ET AL ) Presented by Sameer Saproo.
A shot at Netflix Challenge Hybrid Recommendation System Priyank Chodisetti.
Probability based Recommendation System Course : ECE541 Chetan Tonde Vrajesh Vyas Ashwin Revo Under the guidance of Prof. R. D. Yates.
Customizable Bayesian Collaborative Filtering Denver Dash Big Data Reading Group 11/19/2007.
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.
Item-based Collaborative Filtering Recommendation Algorithms
Matrix Factorization Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
LOGO Recommendation Algorithms Lecturer: Dr. Bo Yuan
Geo-activity Recommendations by using Improved Feature Combination Masoud Sattari, Ismail H. Toroslu, Pinar Senkul, Murat Manguoglu Panagiotis Symeonidis.
A Hybrid Recommender System: User Profiling from Keywords and Ratings Ana Stanescu, Swapnil Nagar, Doina Caragea 2013 IEEE/WIC/ACM International Conferences.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Report #1 By Team: Green Ensemble AusDM 2009 ENSEMBLE Analytical Challenge: Rules, Objectives, and Our Approach.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Online Learning for Collaborative Filtering
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
Authors: Rosario Sotomayor, Joe Carthy and John Dunnion Speaker: Rosario Sotomayor Intelligent Information Retrieval Group (IIRG) UCD School of Computer.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Author(s): Rahul Sami, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl
SINGULAR VALUE DECOMPOSITION (SVD)
The Effect of Dimensionality Reduction in Recommendation Systems
Investigation of Various Factorization Methods for Large Recommender Systems G. Takács, I. Pilászy, B. Németh and D. Tikk 10th International.
Temporal Diversity in Recommender Systems Neal Lathia, Stephen Hailes, Licia Capra, and Xavier Amatriain SIGIR 2010 April 6, 2011 Hyunwoo Kim.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Collaborative Filtering Zaffar Ahmed
The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun
Singular Value Decomposition and Item-Based Collaborative Filtering for Netflix Prize Presentation by Tingda Lu at the Saturday Research meeting 10_23_10.
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
Singular Value Decomposition and Item-Based Collaborative Filtering for Netflix Prize Presentation by Tingda Lu at the Saturday Research meeting 10_23_10.
Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Optimization Indiana University July Geoffrey Fox
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Experimental Study on Item-based P-Tree Collaborative Filtering for Netflix Prize.
Collaborative filtering applied to real- time bidding.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
ItemBased Collaborative Filtering Recommendation Algorithms 1.
Item-Based Collaborative Filtering Recommendation Algorithms
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Homework 1 Tutorial Instructor: Weidong Shi (Larry), PhD
Author(s): Rahul Sami, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial.
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
Q4 : How does Netflix recommend movies?
Collaborative Filtering Matrix Factorization Approach
ITEM BASED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHEMS
Recommendation Systems
Presentation transcript:

Netflix Prize Solution: A Matrix Factorization Approach By Atul S. Kulkarni kulka053@d.umn.edu Graduate student University of Minnesota Duluth Greetings Topic – Netflix prize Solution using Matrix factorization method SVD

Agenda Problem Description Netflix Data Why is it a tough nut to crack? Overview of methods already applied to this problem Overview of the Paper Details of the method How does this method works for the Netflix problem My implementation Results Q and A? Agenda for my talk.

Netflix Prize Problem Given a set of users with their previous ratings for a set of movies, can we predict the rating they will assign to a movie they have not previously rated? Defined at http://www.netflixprize.com//index Seeks to improve the Cinematch’s (Netflix’s existing movie recommender system) prediction performance by 10%. How is the performance measured? Root Mean Square Error (RMSE) Winner gets a prize of 1 Million USD. Take ratings for a movie from all in the class. Try to make a prediction based on that. The Dark Knight Star Wars

Problem Description Recommender Systems Collaborative filtering Use the knowledge about preference of a group of users about a certain items and help predict the interest level for other users from same community. [1] Collaborative filtering Widely used method for recommender systems Tries to find traits of shared interest among users in a group to help predict the likes and dislikes of the other users within the group. [1]

Why is this problem interesting? Used by almost every recommender system today Amazon Yahoo Google Netflix …

Netflix Data Netflix released data for this competition Contains nearly 100 Million ratings Number of users (Anonymous) = 480,189 Number of movies rated by them = 17,770 Training Data is provided per movie To verify the model developed without submitting the predictions to Netflix “probe.txt” is provided To submit the predictions for competition “qualifying.txt” is used

Netflix Data in Pictures These pictures are taken as is from [5]

Netflix Data in Pictures Contd.

Netflix Data in Pictures Contd.

Netflix Data Data in the training file is per movie It looks like this Customer#,Rating,Date of Rating Example 4: 1065039,3,2005-09-06 1544320,1,2004-06-28 410199,5,2004-10-16

Netflix Data Movie# Customer# 1: 30878 2647871 1283744 Data in the qualifying.txt looks like this (No answers) Data points in the “probe.txt” looks like this (Have answers) Movie# Customer# 1: 30878 2647871 1283744 Movie# Customer#, DateofRating 1: 1046323,2005-12-19 1080030,2005-12-23 1830096,2005-03-14

Hard Nut to Crack? Why is this problem such a difficult one? Total ratings possible = 480,189 (user) * 17,770 (movies) = 8532958530 (8.5 Billion) Total available = 100 Million The User x Movies matrix has 8.4 Billion entries missing Consider the problem as Least Square problem We can consider this problem by representing it as system of equation in a matrix

Technically tough as well Huge memory requirements High time requirements Because we are using only ~100 Million of possible 8.5 Billion ratings the predictors have some error in their weights (small training data) 4.3 Gigs if we don’t design the data structures carefully. 350 - 700 Megs if go to the bit level representation in C Training time vary between a few hours to days (15 in my case). Sparse data available for training.

Various Methods Employed for Netflix Prize Problem Nearest Neighbor methods k-NN with variations Matrix factorization Probabilistic Latent Semantic Analysis Probabilistic Matrix Factorization Expectation Maximization for Matrix Factorization Singular Value Decomposition Regularized Matrix Factorization [2] We will not talk a great deal about nearest neighbor methods. Probabilistic variant of LSA – Method from NLP that aims to find hidden concepts in the given set of documents Probabilistic Matrix Factorization – Uses Gaussian model, scales well. Expectation Maximization for MF – tries to find the Maximum likelihood for a the rating using matrix factorization methods. SVD Regularized MF

The Paper Title: “Improving regularized singular value decomposition for collaborative filtering” - Arkadiusz Paterek, Proceedings of KDD Cup and Workshop, 2007. [3] Uses Algorithm described by Simon Funk (Brandyn Webb) in [4]. The algorithm revolves around regularized Singular Value Decomposition (SVD) described in [4] and suggests some interesting use of biases to it to improve performance. It also proposes some methods for post processing of the features extracted from the SVD. It compares the various combinations of methods suggested in the paper for the Netflix Data.

Singular Value Decomposition Consider the given problem as a Matrix of Users x Movies A or Movies x Users Show are the two examples What do we do with this representation? M1 M2 M3 M4 M5 M6 U1 2 4 5 1 U2 3 U3 U1 U2 U3 M1 2 M2 4 M3 5 3 M4 M5 1 M6

Singular Value Decomposition Method of Matrix Factorization Applicable to rectangular matrices and square alike Decomposes the matrix in to 3 component matrices whose product approximates the original matrix E.g. D $d [1] 13.218989 4.887761 1.538870 U $u [,1] [,2] [,3] [1,] -0.5606779 0.8192382 -0.1203705 [2,] -0.5529369 -0.4786352 -0.6820331 [3,] -0.6163612 -0.3158436 0.7213472 V $v [,1] [,2] [,3] [1,] -0.17808307 0.20598164 0.78106201 [2,] -0.16965834 0.67044040 -0.31288023 [3,] -0.52406769 0.28579770 0.15429276 [4,] -0.65435261 0.02532797 -0.26336364 [5,] -0.04182898 -0.09792523 -0.44320373 [6,] -0.48469427 -0.64511243 0.04951659

Can we recover original Matrix? Yes. (Well almost!) Here is how. We will Multiply the 3 Matrices U*D*VT We get – A* ~= A. [,1] [,2] [,3] [,4] [,5] [,6] [1,] 2.000000e+00 4.000000e+00 5 5 -1.557185e-17 1 [2,] -8.564655e-16 -1.221706e-15 3 5 1.000000e+00 5 [3,] 2.000000e+00 -1.231356e-15 4 5 1.757492e-16 5 We can see this is an Approximation of the original matrix. Emphasize on the small values that have show up in stead of missing values.

How do we use SVD? We use the 2 matrices U and V to estimate the original matrix A. So what happened to the diagonal matrix D? We train our method on the given training set and learn by rolling the diagonal matrix in the two matrices. We do U * VT and obtain A’. Error = ∀i∀jAij’ – Aij.

Algorithm variations covered in this paper Simple Predictors Regularized SVD Improved Regularized SVD (with Biases) Post processing SVD with KNN Post processing SVD with kernel ridge regression K-means Linear model for each item Decreasing the number of Parameters 1. Total 6 predictors - 5 predictors are empirical probabilities for the user in question and 6th is the mean value of the rating for the movie. 2. We try to find the two matrices U and V by iterating over the training set. 3. Adding 1 variable per movie and per user called biases to the prediction and running the same training algorithm. 4. SVD_KNN – proposed by an anonymous contestant. Find Movie-movie similarity and define 1 nearest neighbor for this user assign that rating. 5. SVD_KRR – Complex method that discards all the values of matrix U and defines prediction using a Gaussian kernel function. 6. K-means Clustering – divides the users in to K clusters and ratings is the median rating of the cluster. 7. Linear Model for each movie – Another item – item similarity method where for every item we build a weighted linear model learned using Gradient Descent 8. Decreasing # of Parameters – use only movies that are rated by user i are considered and then a model is fit with weights for those movies for that user. This model has #user * #of features as # parameters. Of this what will we Cover???

The SVD Algorithm from paper [3,4,6] Initialize 2 arrays movieFeatures (U) and customerFeatures (V) to very small value 0.1 For every feature# in features Until minimum iterations are done or RMSE is not improving more than minimum improvement For every data point in training set //data point has custID and movieID prating = customerFeatures[feature#][custID] * movieFeatures [feature#][movieID] //Predict the rating error = originalrating - prating //Find the error squareerrsum += error * error //Sum the squared error for RMSE. cf = customerFeatures[feature#][custID] //locally copy current feature value mf = movieFeatures [feature#][movieID] //locally copy current feature value Contd.

Algorithm contd. customerFeatures[feature#][custID] += learningrate *(error * mf – regularizationfactor * cf) //Rolling the ERROR in to the features movieFeatures [feature#][movieID] += learningrate *(error * cf – regularizationfactor * mf) //Rolling the ERROR in to the feature RMSE = (squareerrsum / total number of data points) // Calculate RMSE Now we do the testing For every test point with custID and movieID For every feature# in Features predictedrating += customerFeatures[feature#][custID] * movieFeatures [feature#][movieID] Caveats – clip the ratings in the range (1, 5) predicted rating might go out of bounds “Regularization factor” is introduced by Brandyn Webb in [4] to reduce the over fitting

Variation: Improved Regularized SVD That was regularized SVD Improved Regularized SVD with Biases Predict the rating with 2 added biases Ci per customer and Dj per movie Rating = Ci + Dj + coustomerFeatures[featue#][i] * movieFeatures[Feature#][j] During training update the biases as Ci += learningrate * (err – regularization(Ci + Dj – global_mean)) Dj += learningrate * (err – regularization(Ci + Dj – global_mean)) Learningrate = .001, regularization = 0.05, global_mean = 3.6033

Variation: KNN for Movies Post processing with KNN On the Regularized SVD movieFeature matrix we run cosine similarity between 2 vectors similarity = movieFeature[movieID1]T * movieFeature[movieID2] ||movieFeature[movieID1]||*||movieFeature[movieID2]|| Using this similarity measure we build a neighborhood of 1 nearest movies and predict rating of the nearest movie as the predicted rating

Experimentation Strategy by author Select 1.5% - 15% of the probe.txt as hold-out set or test set. Train all models on rest of the ratings All models predict the ratings Merge the results using linear regression on the test set Combining two methods for initial prediction & then performing linear regression

Results from the Paper[2] Predictor Test RMSE with BASIC Test RMSE with BASIC and RSVD2 Cumulative Test RMSE BASIC .9826 .9039 RSVD .9024 .9018 .9094 RSVD2 KMEANS .9410 .9029 .9010 SVD_KNN .9525 .9013 .8988 SVD_KRR .9006 .8959 .8933 LM .9506 .8995 .8902 NSVD1 .9312 .8986 .8887 NSVD2 .9590 .9032 .8879 SVD_KRR * NSVD1 - SVD_KRR * NSVD2 .8877 Author achieved with RSVD2 and BASIC method a RMSE of .9039 that around 4-5% lower than CineMatch algo. Linear regression with all the predictors from the table gives .8877 on test set and .8911 on the qualifying.txt set. (~6% improvement over Netflix) .8874 7.04% improvement - The solution submitted to the Netflix Prize is the result of merging in proportion 85/15 two linear regressions trained on different training-test partitions: one linear regression with 56 predictors (most of them are different variations of regularized SVD and postprocessing with KNN) and 63 two-way interactions, and the second one with 16 predictors (subset of the predictors from the first regression) and 5 two-way interactions. Replicated from the paper as is

My Experiments I am trying out the regularized SVD method and Improved Regularized SVD method with qualifying.txt, probe.txt Also, going to implement first 3 steps of the author’s experimentation strategy (in my case I will predict with regularized SVD and Improved regularized SVD) If time permits might try SVD KNN method I am also varying some parameters like learning rate, number of features, etc. to see its effect on the results. I shall have all my results posted on the web site soon

Questions?

References Herlocker, J, Konstan, J., Terveen, L., and Riedl, J. Evaluating Collaborative Filtering Recommender Systems. ACM Transactions on Information Systems 22 (2004), ACM Press, 5-53. Gábor Takács, István Pilászy, Bottyán Németh, Domonkos Tikk Scalable Collaborative Filtering Approaches for Large Recommender Systems. JMLR Volume 10 :623--656, 2009. Arkadiusz Paterek, Improving regularized singular value decomposition for collaborative filtering - Proceedings of KDD Cup and Workshop, 2007. http://sifter.org/~simon/journal/20061211.html http://www.igvita.com/2006/10/29/dissecting-the-netflix-dataset/ G. Gorrell and B. Webb. Generalized hebbian algorithm for incremental latent semantic analysis. Proceedings of Interspeech, 2006.

Atul S. Kulkarni kulka053@d.umn.edu Thanks for your time! Atul S. Kulkarni kulka053@d.umn.edu