Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Maximum Margin Matrix Factorization for Collaborative Prediction Jason Rennie MIT Nati Srebro Univ. of Toronto.

Similar presentations


Presentation on theme: "Fast Maximum Margin Matrix Factorization for Collaborative Prediction Jason Rennie MIT Nati Srebro Univ. of Toronto."— Presentation transcript:

1 Fast Maximum Margin Matrix Factorization for Collaborative Prediction Jason Rennie MIT Nati Srebro Univ. of Toronto

2 Collaborative Prediction Based on partially observed matrix: ) Predict unobserved entries 2145 4135 5413 352 453 214 1554 254 331521 3123 4513 335 211 5244 131545 1245 ? ?? ? ?? ? ? ? ? “Will user i like movie j?” users movies

3 Problems to Address Underlying representation of preferences –Norm constrained matrix factorization (MMMF) Discrete, ordered labels –Threshold-based ordinal regression Scaling-up MMMF to large problems –Factorized objective, gradient descent Ratings may not be missing at random

4 Linear Factor Model v1v1 v2v2 v3v3 comic value dramatic value violence movies + + +1.5 -1.9 +2.4 × × × Features Preferences movies 1.4-5.92.2-0.8-3.74.6-1.83.5 Ratings movies 31432524 User Preference Weights Feature Values

5 Ordinal Regression v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 w1 Feature Vectors Preference Weights -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 Preference ScoresRatings 2335522314 

6 Matrix Factorization v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 Feature Vectors Preference Weights Preference Scores Ratings 2335522314 2335522314 2335522314 2335522314 2335522314 2335522314 2335522314 2335522314 2335522314 2335522314 2335522314 2335522314 2335522314 2335522314 2335522314 

7 Matrix Factorization U V’ XY 

8 Ordinal Regression v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 w1 Feature Vectors Preference Weights -2.51.4-0.95.63.4-1.8-2.90.2-4.22.1 Preference ScoresRatings 2335522314 1 2 3 4 5 

9 Max-Margin Ordinal Regression 1 2 3 4 5 [Shashua & Levin, NIPS 2002]

10 Absolute Difference Shashua & Levin’s loss bounds the misclassification error Ordinal Regression: we want to minimize the absolute difference between labels

11 All-Thresholds Loss 1 2 3 4 5 [Srebro et al., NIPS 2004] [Chu & Keerthi, ICML 2005]

12 All-Thresholds Loss Experiments comparing: –Least squares regression –Multi-class classification –Shashua & Levin’s Max-Margin OR –All-Thresholds OR All-Thresholds Ordinal Regression –Lowest misclassification error –Lowest absolute difference error [Rennie & Srebro, IJCAI Wkshp 2005]

13 Learning Weights & Features v1 v2 v3 v4v5v6v7v8v9v10 Feature Vectors w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 w13 w14 w15 Preference Weights Ratings 3523 23352234 2352 25214 235214 3352314 235231 324 2352234 2321 23521 23352234 235521 3352314 23523 4 5 2 2 4 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 -2.51.4-0.95.63.1-1.8-2.70.2-4.22.1 Preference Scores 

14 Low Rank Matrix Factorization V’ U × ¼ X rank k = 245142 312254 424131 33424 231432 22145 241423 131143 422531 Y Use SVD to find Global Optimum Non-convex No explicit soln. Sum-Squared Loss Fully Observed Y Classification Error Loss Partially Observed Y

15 V’ UX low norm [Fazel et al., 2001] Norm Constrained Factorization ||X|| tr = min U,V (||U|| Fro 2 + ||V|| Fro 2 )/2 ||U|| Fro 2 = ∑ i,j U ij 2

16 MMMF Objective min X ||X|| tr + c loss(X,Y) All-Thresholds Original Objective [Srebro et al., NIPS 2004] X V’ U low norm ||U|| Fro 2 = ∑ i,j U ij 2 min U,V (||U|| Fro 2 + ||V|| Fro 2 )/2 + c loss(UV’,Y) Factorized Objective All-Thresholds

17 Smooth Hinge Hinge Gradient Smooth Hinge Function Value Hinge Smooth Hinge

18 Collaborative Prediction Results size, sparsity: EachMovie 36656x1648, 96% MovieLens 6040x3952, 96% Algorithm Weak Error Strong Error Weak Error Strong Error URP.4422.4557.4341.4444 Attitude.4520.4550.4320.4375 MMMF.4397.4341.4156.4203 URP & Attitude Results: [Marlin, 2004]

19 Local Minima? min U,V (||U|| Fro 2 + ||V|| Fro 2 )/2 + c loss(UV’,Y) Factorized Objective  Optimize X with SDP Optimize U,V with grad. descent

20 Local Minima? Matrix Difference  Y X Data: 100 x 100 MovieLens, 65% sparse Objective Difference 

21 Summary We scaled MMMF to large problems by optimizing the Factorized Objective Empirical tests indicate that local minima issues are rare or absent Results on large-scale data show substantial improvements over state-of- the-art D’Aspremont & Srebro: large-scale SDP optimization methods. Train on 1.5 million binary labels in 20 hours.


Download ppt "Fast Maximum Margin Matrix Factorization for Collaborative Prediction Jason Rennie MIT Nati Srebro Univ. of Toronto."

Similar presentations


Ads by Google