Presentation is loading. Please wait.

Presentation is loading. Please wait.

C O B A F I : COLLABORATIVE BAYESIAN FILTERING Alex Beutel Joint work with Kenton Murray, Christos Faloutsos, Alex Smola April 9, 2014 – Seoul, South Korea.

Similar presentations


Presentation on theme: "C O B A F I : COLLABORATIVE BAYESIAN FILTERING Alex Beutel Joint work with Kenton Murray, Christos Faloutsos, Alex Smola April 9, 2014 – Seoul, South Korea."— Presentation transcript:

1 C O B A F I : COLLABORATIVE BAYESIAN FILTERING Alex Beutel Joint work with Kenton Murray, Christos Faloutsos, Alex Smola April 9, 2014 – Seoul, South Korea

2 Online Recommendation 25 Users Movies 5 3 5 5 2

3 Online Rating Models 3

4 Normal Collaborative Filtering Fit a Gaussian - Minimize the error Reality Minimizing error isnt good enough - Understanding the shape matters! 4

5 Online Rating Models Our Model 5 Normal Collaborative Filtering Fit a Gaussian - Minimize the error

6 Our Goals and Challenges Given: A matrix of user ratings Find: A model that best fits and predicts user preferences Goals: G1. Fit the recommender distribution G2. Understand users who rate few items G3. Detect abnormal spam behavior 6

7 1. Background OUTLINE 2. Model Formulation 3. Inference 4. Catching Spam 5. Experiments 7

8 Collaborative Filtering X U V Users Movies Genres 5 = 1.50.73 6 0002.236 2.231.20.2 5 = 8 [Background]

9 Matrix Factorization X Users Movies 9 [Background] U V Genres

10 Bayesian Probabilistic Matrix Factorization (Salakhutdinov & Mnih, ICML 2008) μUμU ~ … 10 [Background]

11 1. Background OUTLINE 2. Our Model 3. Inference 4. Catching Spam 5. Experiments 11

12 Our Model 12 Use user preferences to predict ratings Cluster users (& items) Share preferences within clusters

13 The Recommender Distribution First introduced by Tan et al, 2013 Normalization θ 2 = -1.0θ 2 = 0.4 θ 1 = 0 Vary θ 2 13 Linear Quadratic

14 The Recommender Distribution 0.30.40.30.2-0.70.40.30.80.4 Genre PreferencesGeneral LeaningHow Polarized uiui 14 Goal 1: Fit the recommender distribution

15 Understanding varying preferences 5 5 2 15 3 1 5 1

16 Resulting Co-clustering U V 16

17 Finding User Preferences μUμU μUμU 17 Goal 2: Understand users who rate few items

18 Chinese Restaurant Process μ1μ1 μ2μ2 μ3μ3 18

19 1. Background OUTLINE 2. Our Model 3. Inference 4. Catching Spam 5. Experiments 19

20 Gibbs Sampling - Clusters Probability of a cluster based on size (CRP) x Probability u i would come from the cluster [Details] 20 Probability of picking a cluster =

21 Sampling user parameters [Details] Probability of preferences u i given cluster parameters x Probability of predicting ratings r i,j using new preferences Recommender distribution is non-conjugate Cant sample directly! 21 Probability of user preferences u i =

22 1. Background OUTLINE 2. Our Model 3. Inference 4. Catching Spam 5. Experiments 22

23 Review Spam and Fraud 5 5 Image from http://sinovera.deviantart.com/art/Cute-Devil-117932337 1 1 1 1 1 1 1 1 1 5 5 5 5 5 23

24 Clustering Fraudsters μ1μ1 μ2μ2 μ3μ3 New Spam ClusterPrevious Real Cluster 24

25 Clustering Fraudsters μ1μ1 μ2μ2 μ3μ3 Too much spam – get separated into fraud cluster Trying to hide just means (a) very little spam or (b) camouflage reinforcing realistic reviews. 25

26 Clustering Fraudsters μ1μ1 μ2μ2 μ3μ3 μ4μ4 μ5μ5 Naïve Spammers Spam + NoiseHijacked Accounts 26 Goal 3: Detect abnormal spam behavior

27 1. Background OUTLINE 2. Our Model 3. Inference 4. Catching Spam 5. Experiments 27

28 Does it work? 28 Better Fit

29 Catching Naïve Spammers 29 83% are clustered together Injection

30 Clustered Hijacked Accounts Clustered hijacked accounts Clustered attacked movies 30 Injection

31 Real world clusters 31

32 Shape of real world data 32

33 Shape of Netflix reviews Most GaussianMost skewed The RookieThe O.C. Season 2 The FanSamurai X: Trust and Betrayal Cadet KellyAqua Teen Hunger Force: Vol. 2 Money TrainSealab 2001: Season 1 Alice Doesnt Live HereAqua Teen Hunger Force: Vol. 2 Sea of LoveGilmore Girls: Season 3 Boiling PointFelicity: Season 4 True BelieverThe O.C. Season 1 StakeoutThe Shield Season 3 The PackageQueer as Folk Season 4 33 More Gaussian More Skewed

34 Shape of Amazon Clothing reviews Amazon Clothing Most Skewed Reviews Bra Disc Nipple Covers Vanity Fair Womens String Bikini Panty Lee Mens Relaxed Fit Tapered Jean Carhartt Mens Dungaree Jean Wrangler Mens Cowboy Cut Slim Fit Jean Nearly all are heavily polarized! 34

35 Shape of Amazon Electronics reviews Amazon Electronics Most Skewed Reviews Sony CD-R 50 Pack Spindle Olympus Stylus Epic Zoom Camera Sony AC Adapter Laptop Charger Apricorn Hard Drive Upgrade Kit Corsair 1GB Desktop Memory Nearly all are heavily polarized! 35

36 Shape of BeerAdvocate reviews BeerAdvocate Most Gaussian Reviews Weizenbock (Sierra Nevada) Ovila Abbey Saison (Sierra Nevada) Stoudts Abbey Double Ale Stoudts Fat Dog Stout Juniper Black Ale Nearly all are Gaussian! 36

37 Hypotheses on shape of data Hard to evaluate beyond binary Selection bias – Only committed viewers watch Season 4 of a TV series Hard to compare value across very different items. Lots of beers and movies to compare Fewer TV shows Even fewer jeans or hard drives vs. 37

38 Key Points Modeling: Fit real data with flexible recommender distribution Prediction: Predict user preferences Anomaly Detection: When does a user not match the normal model? 38

39 Questions? Alex Beutel abeutel@cs.cmu.edu http://alexbeutel.com 39

40 u5u5 u6u6 μaμa μαμα Sampling Cluster Parameters Hyperparameters μ α, λ α, W α, ν Priors on μ α, λ α, W α 40

41 Gibbs Sampling - Clusters Probability of a cluster (CRP) Probability u i would be sampled from cluster a [Details] 41

42 Sampling user parameters [Details] Probability of u i given cluster parameters Probability of predicting ratings r i,j Recommender distribution is non-conjugate Cant sample directly! 42 Use a Laplace approximation and perform Metropolis-Hastings Sampling

43 Sampling user parameters [Details] Use candidate normal distribution Mode of p( u i )Variance of p( u i ) Sample Metropolis-Hastings Sampling: Keep new with probability 43

44 Sampling Cluster Parameters Priors Users/Items in the cluster [Details] 44

45 Inferring Hyperparameters [Details] Solved directly – no sampling needed! Prior hidden as additional cluster 45

46 Have to use non-standard sampling procedure: 99.12% acceptance rate for Amazon Electronics 77.77% acceptance rate for Netflix 24k Does Metropolis Hasting work? 46

47 Does it work? UniformBPMFCoBaFi (us) Netflix (24k users) 1.69041.25251.1827 BeerAdvocate2.19721.98551.6741 Compare on Predictive Probability (PP) to see how well our model fits the data 47

48 Handling Spammers PP BeforePP After BPMF1.70471.8146 CoBaFi1.05491.7042 PP BeforePP After BPMF1.23751.3057 CoBaFi0.96701.2935 Random naïve spammers in Amazon Electronics dataset Random hijacked accounts in Netflix 24k dataset 48

49 Clustered Naïve Spammers 83% are clustered together 49

50 Clustered Hijacked Accounts Clustered hijacked accountsClustered attacked movies 50


Download ppt "C O B A F I : COLLABORATIVE BAYESIAN FILTERING Alex Beutel Joint work with Kenton Murray, Christos Faloutsos, Alex Smola April 9, 2014 – Seoul, South Korea."

Similar presentations


Ads by Google