Presentation is loading. Please wait.

Presentation is loading. Please wait.

David Stern, Thore Graepel, Ralf Herbrich Online Services and Advertising Group MSR Cambridge.

Similar presentations


Presentation on theme: "David Stern, Thore Graepel, Ralf Herbrich Online Services and Advertising Group MSR Cambridge."— Presentation transcript:

1 David Stern, Thore Graepel, Ralf Herbrich Online Services and Advertising Group MSR Cambridge

2 Overview Motivation. Matchbox model. Model Training. Accuracy. Generating Fast Recommendations. Compositionality Applications

3 Large scale personal recommendations UserUserItemItem

4 Collaborative Filtering 112233445566 AA BB CC DD Users Items ?????? Metadata?

5 Large Scale Personal Recommendations: –Products. –Services. –People. Leverage user and item metadata. Flexible feedback: –Ratings. –Clicks. Incremental Training. Goals

6

7 Map Sparse Features To ‘Trait’ Space 234566 456457 13456 654777 User ID Male Female Gender Country UK USA 1.2m Height 34 345 64 5474 Item ID Horror Movie Genre Drama Documentary Comedy

8 Matchbox With Metadata rr User Metadata ** s1s1s1s1 s1s1s1s1 ++ u 11 u 21 s2s2s2s2 s2s2s2s2 ++ u 12 u 22 Item Metadata t1t1t1t1 t1t1t1t1 ++ v 11 v 21 t2t2t2t2 t2t2t2t2 ++ v 12 v 22 User ‘trait’ 1 User ‘trait’ 2 Male British Camera SLR u 01 u 02 ID=234 User Item Rating potential ~

9

10 Factor Graphs / Trees Definition: Graphical representation of product structure of a function (Wiberg, 1996) –Nodes: = Factors = Variables –Edges: Dependencies of factors on variables. Question: –What are the marginals of the function (all but one variable are summed out)?

11 ss s2s2s2s2 s2s2s2s2 s1s1s1s1 s1s1s1s1 Factor Graphs and Bayesian Inference Bayes’ lawBayes’ law Factorising priorFactorising prior Factorising likelihoodFactorising likelihood Sum out latent variablesSum out latent variables t1t1t1t1 t1t1t1t1 t2t2t2t2 t2t2t2t2 dd yy

12 Factor Trees: Separation v v w w x x f1(v,w)f1(v,w) f1(v,w)f1(v,w) f2(w,x)f2(w,x) f2(w,x)f2(w,x) Observation: Sum of products becomes product of sums of all messages from neighbouring factors to variable! y y f3(x,y)f3(x,y) f3(x,y)f3(x,y) z z f4(x,z)f4(x,z) f4(x,z)f4(x,z)

13 Messages: From Factors To Variables w w x x f2(w,x)f2(w,x) f2(w,x)f2(w,x) Observation: Factors only need to sum out all their local variables! y y f3(x,y)f3(x,y) f3(x,y)f3(x,y) z z f4(x,z)f4(x,z) f4(x,z)f4(x,z)

14 Messages: From Variables To Factors x x f2(w,x)f2(w,x) f2(w,x)f2(w,x) Observation: Variables pass on the product of all incoming messages! y y f3(x,y)f3(x,y) f3(x,y)f3(x,y) z z f4(x,z)f4(x,z) f4(x,z)f4(x,z)

15 The Sum-Product Algorithm Three update equations (Aji & McEliece, 1997) Update equations can be directly derived from the distributive law.Update equations can be directly derived from the distributive law. Efficient for messages in the exponential family.Efficient for messages in the exponential family. Calculate all marginals at the same time.Calculate all marginals at the same time.

16 Approximate Message Passing Problem: The exact messages from factors to variables may not be closed under products. Solution: Approximate the marginal as well as possible in the sense of minimal KL divergence. Expectation Propagation (Minka, 2001): Approximate the marginal by moment-matching resulting in

17 Gaussian Message Passing * * = = * * = = ≈ ≈

18 Distributed Message Passing Non-distributed Distributed

19 Message Passing For Matchbox rr ** s1s1s1s1 s1s1s1s1 ++ u 11 u 21 s2s2s2s2 s2s2s2s2 ++ u 12 u 22 t1t1t1t1 t1t1t1t1 ++ v 11 v 21 t2t2t2t2 t2t2t2t2 ++ v 12 v 22 u 01 u 02

20 Message Passing For Matchbox rr ** s1s1s1s1 s1s1s1s1 ++ u 11 u 21 s2s2s2s2 s2s2s2s2 ++ u 12 u 22 t1t1t1t1 t1t1t1t1 ++ v 11 v 21 t2t2t2t2 t2t2t2t2 ++ v 12 v 22 u 01 u 02 ** ++ Message update functions powered by Infer.net

21 User/Item Trait Space ‘Preference Cone’ for user 145035

22 Incremental Training with ADF 112233445566 AA BB CC DD Users Items

23 ADF: Message Passing Iteration 1

24 Message Passing Iteration 2

25 Message Passing Iteration 3

26 Message Passing Iteration 4

27

28 rr ** s1s1s1s1 s1s1s1s1 ++ u 11 u 21 s2s2s2s2 s2s2s2s2 ++ u 12 u 22 t1t1t1t1 t1t1t1t1 ++ v 11 v 21 t2t2t2t2 t2t2t2t2 ++ v 12 v 22 u 01 u 02 Feedback Models

29 rr ** s1s1s1s1 s1s1s1s1 ++ u 11 u 21 s2s2s2s2 s2s2s2s2 ++ u 12 u 22 t1t1t1t1 t1t1t1t1 ++ v 11 v 21 t2t2t2t2 t2t2t2t2 ++ v 12 v 22 u 01 u 02

30 Feedback Models rr =3=3 qq

31 t0t0t0t0 t0t0t0t0 t1t1t1t1 t1t1t1t1 t2t2t2t2 t2t2t2t2 t3t3t3t3 t3t3t3t3 >> >> << << rr qq

32 rr >0>0 qq

33

34 Performance and Accuracy MovieLens Data 1 million ratings1 million ratings 3,900 movies / 6,040 users3,900 movies / 6,040 users User / movie metadataUser / movie metadata

35 MovieLens – 1,000,000 ratings User Job OtherLawyer AcademicProgrammer ArtistRetired AdminSales StudentScientist Customer Service Self-Employed Health CareTechnician ManagerialCraftsman FarmerUnemployed HomemakerWriter User Age <18 18-25 25-34 35-44 45-49 50-55 >55 User Gender Male Female Movie Genre ActionHorror AdventureMusical AnimationMystery Children’sRomance ComedyThriller CrimeSci-Fi DocumentaryWar DramaWestern FantasyFilm Noir 6,040 users 3,900 movies User IDMovie ID

36 MovieLens with Thresholds Model (ADF), Training Time= 1 Minute Mean Absolute Error

37 MovieLens Error with Thresholds Mean Absolute Error

38

39 Recommendation Speed Goal: find N items with highest predicted rating. Challenge: potentially have to consider all items. Two approaches to make this faster: –Locality Sensitive Hashing –KD Trees Locality Sensitive Hash:

40 Random Projection Hashing 0 10 0 1 Random Projections: –Generate random hyper planes (m random vectors, a i ). –Gives m bit hash,, by: p(all bits match)  cosine similarity. Store items in buckets indexed by keys. Given a user trait vector: 1.Generate key, q. 2.Search buckets by hamming distance from q until find N items.

41 Accuracy and Speedup

42

43 Context Model User Model Item Model Message Passing: Compositionality rr ** s1s1s1s1 s1s1s1s1 ++ u 11 u 21 s2s2s2s2 s2s2s2s2 ++ u 12 u 22 t1t1t1t1 t1t1t1t1 ++ v 11 v 21 t2t2t2t2 t2t2t2t2 ++ v 12 v 22 ++ x4x4x4x4 x4x4x4x4 x3x3x3x3 x3x3x3x3 x2x2x2x2 x2x2x2x2 x1x1x1x1 x1x1x1x1 >0>0 qq Feedback Model

44

45 Applications Ranking of content on web portalsOnline advertising (Display and Paid Search)Personalised web searchAlgorithm portfolio managementTweet/News recommendationFriends recommendation on social platforms

46

47 Conclusions Collaborative Filtering with Content information. Users and items compared in same ‘trait space’. Fast training by message passing. Fast recommendations by random projections. Flexible feedback model. Many valuable application scenarios


Download ppt "David Stern, Thore Graepel, Ralf Herbrich Online Services and Advertising Group MSR Cambridge."

Similar presentations


Ads by Google