Presentation is loading. Please wait.

Presentation is loading. Please wait.

Online Learning for Collaborative Filtering

Similar presentations


Presentation on theme: "Online Learning for Collaborative Filtering"— Presentation transcript:

1 Online Learning for Collaborative Filtering
Guang Ling, Haiqin Yang, Irwin King, Michael Lyu Presented by Guang LING

2 Outline Introduction PMF and RMF Online PMF and Online RMF
Experiments and Results Conclusion and Future Work

3 Introduction We face unprecedentedly large amount of choice!
Search Vs. Recommend

4 Introduction Recommender system emerged Content based filtering
Analyze item content Collaborative filtering Rating based

5 Introduction Collaborative filtering Allow user to rate items
Infer user’s taste and item’s feature based on ratings Match user’s preferences with item’s features

6 Introduction Various methods have been developed
Memory based User based Item based Model based PMF, RMF PLSA, PLPA So, what is the problem? I1 I2 I3 I4 U1 1 5 4 ? U2 2 U3

7 Introduction Unrealistic assumptions Reality All ratings are available
There will be no new rating Data set are small enough to be handled in main memory Reality Ratings are collected over time New ratings are received constantly Huge data set cannot be easily handled

8 Introduction We propose online CF algorithm that
Obviate the need to hold all data Make incremental changes based solely on new rating Scale linearly with the number of ratings Extra features Command explicit regularization effect

9 PMF and RMF Matrix factorization models Factor R into U and V Minimize
Square loss: PMF Cross entropy: RMF No. users No. items

10 PMF Conditional distribution over observed ratings:
Spherical Gaussian priors on user and movie feature vectors: Maximize posterior:

11 PMF Maximize Equivalent to minimize the following loss:
Using gradient descent to minimize loss: Squared loss Regularization

12 RMF Top one probability Minimize cross entropy
The probability that an item i being ranked on top Minimize cross entropy Cross entropy measures the divergence between two distributions Un-normalized KL-divergence

13 RMF Model loss is defined as: Using gradient descent to minimize:
Cross entropy Regularization

14 Online PMF We propose two online algorithms for PMF
Stochastic gradient descent Adjust model stochastically for each observation Regularized dual averaging Maintain an approximated average gradient Solve an easy optimization problem at each iteration

15 Stochastic Gradient Descent PMF
Recall the loss function for PMF Squared loss can be dissected and associated with each observation triplet Update model using gradient of this loss:

16 Regularized Dual Averaging PMF
Maintain the approximated average gradient Previous gradient Gradient due to new observation Number of items rated by u

17 Regularized Dual Averaging PMF
Solve the following optimization problem to obtain New user feature vector New item feature vector

18 Online RMF Similar to online PMF, we propose two online algorithms for RMF Stochastic Gradient Descent Regularized Dual Averaging However, the challenge is Loss function cannot be easily dissected

19 Online RMF Recall the loss function for RMF
When a new observation is revealed Loss due to new item Decay of previous items

20 Online RMF We approximate the gradient by Decay previous gradient
Gradient with respect to new item Decay previous gradient Gradient with respect to new item

21 Online RMF Stochastic Gradient Descent RMF Dual Averaging RMF

22 Experiments and Results
Online Vs. Batch algorithms Performance under different settings Sensitivity analysis of parameters Scalability to large dataset

23 Evaluation Metric Root Mean Square Error(RMSE)
The lower the better Normalized Discounted Cumulative Gain(NDCG) The higher the better

24 Online Vs. Batch algorithms
We conduct experiments on real life data set MovieLens: movie rating data set 6,040 users 3,900 movies 1,000,209 ratings 4.25% of user-item rating matrix is known Simulate three settings T1: 10% training, 90% testing T5: 50% training, 50% testing T9: 90% training, 10% testing

25 Online Vs. Batch algorithms
Shown below is the PMF result T1 T5 T9

26 Online Vs. Batch algorithms
Shown below is the RMF result T1 T5 T9

27 Impact of in PMF denote the regularization parameter Observation
Fewer training data needs more regularization Results are quite sensitive to regularization SGD-PMF DA-PMF

28 Impact of in RMF denote the regularization parameter Observation
Fewer training data needs more regularization SGD-RMF RDA-RMF

29 Impact of learning rate
We use to denote the learning rate It is used in stochastic gradient descent algorithms only SGD-RMF SGD-PMF

30 Scalability to large dataset
Yahoo! Music dataset Largest CF dataset publicly available 252,800,275 ratings 1,000,990 users 624,961 items Rating value range [0, 100]

31 Scalability to large dataset
Experiment environment Linux workstation (Xeon Dual Core 2.4 GHz, 32 GB RAM) Batch PMF: 8 hours for 120 iteration Online PMF: 10 minutes T1 T5

32 Conclusion and Future Work
We proposed online CF algorithms Perform comparable or even better than corresponding batch algorithms Scales linearly with number of ratings Adjust model incrementally given new observation Future Work Theoretical bound for convergence rate Find better approximation for average gradient of RMF

33 Thanks! Questions?


Download ppt "Online Learning for Collaborative Filtering"

Similar presentations


Ads by Google