Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.

Similar presentations


Presentation on theme: "Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services."— Presentation transcript:

1 Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services and Advertising Microsoft Research Cambridge, UK NIPS 2009 – December 2009

2 Outline Online Advertising and Paid Search AdPredictor TM : Predicting User Clicks on Ads [Appendix] Model shrinking Parallel training

3 ONLINE ADVERTISING AND PAID SEARCH

4 Advertising Industry Business: Size 0 100 200 300 400 500 600 200120022003200420052006 Year Outdoor Cinema Radio TV Print Online Annual Expenditure (in billion USD) GDP Denmark (2006) Microsoft Revenue (2008) Data: World Advertising Research Center Report 2007

5 Advertising Industry Business: Growth -20.00% -10.00% 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 200120022003200420052006 Year Outdoor Cinema Radio TV Print Online Data: World Advertising Research Center Report 2007

6

7 $1.00 $2.00 $0.10 * 10% * 4% * 50% =$0.10 =$0.08 =$0.05 $0.80 $1.25 $0.05 Display to users (expected bid) Charge advertisers (per click) Increase user satisfaction by better targeting Fairer charges to advertisers Increase revenue by showing ads with high click-thru rate Importance of accurate probability estimates:

8 The Scale of Things Realistic training set for proof of concept: 7,000,000,000 impressions 2 weeks of CPU time during training: 2 wks × 7 days × 86,400 sec/day = 1,209,600 seconds Learning algorithm speed requirement: – 5,787 impression updates / sec – 172.8 μs per impression update

9 ADPREDICTOR Bayesian Linear Probit Regression

10 Impression Level Predictions

11 One Weight per Feature Value 102.34.12.201 15.70.165.9 221.98.2.187 92.154.3.86 Client IP Exact Match Broad Match Match Type Position ML-1 SB-1 SB-2 + pClick

12 Click Potential Linear: click potential = sum of feature click contributions click potential 0 PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2 ListingId = 798831 ClientIP = 98.0.101.23 clickclick no click Impression click potential

13 Gaussian Noise Probit: area under Gaussian tail as a function of click potential click potential 0 Impression click potential clickclick no click P(click) = P(potential > 0)

14 Probit Probit: area under Gaussian tail as a function of click potential click potential 0 Impression click potential 100% P(click|Impression)

15 click potential 0 PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2 ListingId = 798831 ClientIP = 98.0.101.23 Modelling Uncertainty

16 click potential 0 Impression click potential Uncertainty about the Potential

17 click potential 0 Impression click potential Probability of Click 100%

18 Uncertainty: Bayesian Probabilities 102.34.12.201 15.70.165.9 221.98.2.187 92.154.3.86 Client IP Exact Match Broad Match Match Type Position ML-1 SB-1 SB-2 p(pClick) + +

19 Principled Exploration

20 Training Algorithm in Action No Click Click w1w1 w1w1 w2w2 w2w2 + + z z c c Prediction Training/Update

21 Posterior Updates for the Click Event

22 Client IP: Mean & Variance

23 Calibrated Predictions

24 Joint Updates vs. Independent Aggregation Naive Bayes

25 adPredictor Wrap Up Automatic learning rateCalibrated: 2% prediction means 2% clicksUse of many features, even if correlatedModelling the uncertainty explicitlyNatural exploration modeParallelizable with approximate inference

26 Thank you! thoreg@microsoft.com

27 APPENDIX

28 Dealing with Millions of Variables Observation 1: Large variable bags follow a power-law w.r.t. frequency of items Observation 2: Weight posteriors of rare items are close to their prior Idea: 1.Initially, the belief of each new item is compactly represented by one (and the same) prior 2.After observing an item for the first time, the posterior is allocated 3.At regular intervals, all weight posteriors with a small deviation from the prior are removed

29 Naïve Approach – Shared Memory Does not scale – Constant contention for locks – Some features are very frequent – Synchronization issues Training Node 1Training Node 2 Impression AImpression B MSNH1110.0.0.1USA (etc) MSNH11Canada10.0.1.25(etc) Model File Conflict! Update

30 Proposal: Approximate Learning Train Node 1Train Node 2 Impression AImpression B MSNH1110.0.0.1USA (etc) MSNH11Canada10.0.1.25(etc) Update Merge Deltas Update Final Model File


Download ppt "Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services."

Similar presentations


Ads by Google