Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.

Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services and Advertising Microsoft Research Cambridge, UK NIPS 2009 – December 2009

Outline Online Advertising and Paid Search AdPredictor TM : Predicting User Clicks on Ads [Appendix] Model shrinking Parallel training

ONLINE ADVERTISING AND PAID SEARCH

Advertising Industry Business: Size 0 100 200 300 400 500 600 200120022003200420052006 Year Outdoor Cinema Radio TV Print Online Annual Expenditure (in billion USD) GDP Denmark (2006) Microsoft Revenue (2008) Data: World Advertising Research Center Report 2007

Advertising Industry Business: Growth -20.00% -10.00% 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 200120022003200420052006 Year Outdoor Cinema Radio TV Print Online Data: World Advertising Research Center Report 2007

$1.00 $2.00 $0.10 * 10% * 4% * 50% =$0.10 =$0.08 =$0.05 $0.80 $1.25 $0.05 Display to users (expected bid) Charge advertisers (per click) Increase user satisfaction by better targeting Fairer charges to advertisers Increase revenue by showing ads with high click-thru rate Importance of accurate probability estimates:

The Scale of Things Realistic training set for proof of concept: 7,000,000,000 impressions 2 weeks of CPU time during training: 2 wks × 7 days × 86,400 sec/day = 1,209,600 seconds Learning algorithm speed requirement: – 5,787 impression updates / sec – 172.8 μs per impression update

ADPREDICTOR Bayesian Linear Probit Regression

Impression Level Predictions

One Weight per Feature Value 102.34.12.201 15.70.165.9 221.98.2.187 92.154.3.86 Client IP Exact Match Broad Match Match Type Position ML-1 SB-1 SB-2 + pClick

Click Potential Linear: click potential = sum of feature click contributions click potential 0 PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2 ListingId = 798831 ClientIP = 98.0.101.23 clickclick no click Impression click potential

Gaussian Noise Probit: area under Gaussian tail as a function of click potential click potential 0 Impression click potential clickclick no click P(click) = P(potential > 0)

Probit Probit: area under Gaussian tail as a function of click potential click potential 0 Impression click potential 100% P(click|Impression)

click potential 0 PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2 ListingId = 798831 ClientIP = 98.0.101.23 Modelling Uncertainty

click potential 0 Impression click potential Uncertainty about the Potential

click potential 0 Impression click potential Probability of Click 100%

Uncertainty: Bayesian Probabilities 102.34.12.201 15.70.165.9 221.98.2.187 92.154.3.86 Client IP Exact Match Broad Match Match Type Position ML-1 SB-1 SB-2 p(pClick) + +

Principled Exploration

Training Algorithm in Action No Click Click w1w1 w1w1 w2w2 w2w2 + + z z c c Prediction Training/Update

Posterior Updates for the Click Event

Client IP: Mean & Variance

Calibrated Predictions

Joint Updates vs. Independent Aggregation Naive Bayes

adPredictor Wrap Up Automatic learning rateCalibrated: 2% prediction means 2% clicksUse of many features, even if correlatedModelling the uncertainty explicitlyNatural exploration modeParallelizable with approximate inference

Thank you! thoreg@microsoft.com

APPENDIX

Dealing with Millions of Variables Observation 1: Large variable bags follow a power-law w.r.t. frequency of items Observation 2: Weight posteriors of rare items are close to their prior Idea: 1.Initially, the belief of each new item is compactly represented by one (and the same) prior 2.After observing an item for the first time, the posterior is allocated 3.At regular intervals, all weight posteriors with a small deviation from the prior are removed

Naïve Approach – Shared Memory Does not scale – Constant contention for locks – Some features are very frequent – Synchronization issues Training Node 1Training Node 2 Impression AImpression B MSNH1110.0.0.1USA (etc) MSNH11Canada10.0.1.25(etc) Model File Conflict! Update

Proposal: Approximate Learning Train Node 1Train Node 2 Impression AImpression B MSNH1110.0.0.1USA (etc) MSNH11Canada10.0.1.25(etc) Update Merge Deltas Update Final Model File

Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.

Similar presentations

Presentation on theme: "Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.

Similar presentations

Presentation on theme: "Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services."— Presentation transcript:

Similar presentations

About project

Feedback