Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Methods for Targeted Advertising Max Chickering Microsoft Research.

Similar presentations


Presentation on theme: "Probabilistic Methods for Targeted Advertising Max Chickering Microsoft Research."— Presentation transcript:

1 Probabilistic Methods for Targeted Advertising Max Chickering Microsoft Research

2 Outline Targeted Mailing To whom should you send a solicitation? Targeted Advertising on the Web How should you display banner ads to maximize click-through?

3 Targeted Mailing Given a population of potential customers. PersonX 1 X 2 …X n 100…red 203.4…blue.... m17…green Sending an advertisement costs money: - Postage - Possible Discount Which potential customers do you solicit?

4 Motivating Application Advertisement: MSN subscription Potential customers: People who registered Windows 95 Known variables: 15from questionnaire (e.g. gender, RAM size)

5 Naïve Solutions Mail to those customers most likely to subscribe to MSN Can waste money by targeting customers who would subscribe anyway Mail to everyone Even worse!

6 Response Behaviors MailDon’t Mail Always buyerYesYes PersuadableYesNo Anti-persuadableNoYes Never buyerNoNo Will the potential customer buy the product? We only make money from mailing to the persuadable potential customers

7 Expected Profit for a Population Population of N potential cutomers N alw, N per, N anti, N nev Cost of mailing c Solicited and unsolicited revenue r Expected Profit from mailing Profit from not mailing

8 Lift in Profit From Mailing Profit from mailing - Profit from not mailing For any set of potential customers, we should only mail if the lift is positive.

9 Learning Expected Lift S  {s 0, s 1 }(did not subscribe, did subscribe) M  {m 0, m 1 }(did not mail, did mail) Identifiable if S, M known in training data Lift : -c + [ p(S=s 1 |M=m 1 ) – p(S=s 1 |M=m 0 ) ]  r

10 Controlled Experiment: Identify Profitable Sub-Populations 1.Choose a small sample of the potential customers 2.Randomly divide those customers into a “treatment group” (M = m 1 ) and a “control group” (M = m 0 ) 3.Wait a specified period of time, and record S = s 0 or S = s 1 for each

11 Controlled Experiment PersonX 1 X 2 …X n M S 100…red m 1 s 0 203.4…blue m 0 s 1.... m17…green m 1 s 1 Use machine-learning techniques to identify sub-populations with high positive lift, and then target those customers Lift ( Sub-population corresponding to X n =blue ) = -c + [ p(S=s 1 |M=m 1, X n =blue) – p(S=s 1 |M=m 0, X n =blue) ]  r

12 Identify Profitable Sub-Populations Partitions of X define sub-populations and statistical model for p(S|M,X) defines the lift Approach: Use Decision Trees Known distinctions in our data : X = {X 1, …, X n }, S, M X 1 > 10, X 4 = 2 X 1 < 10, X 12 = false X 1 < 10, X 12 = true Lift 2 Lift 3 Lift 4 X 1 > 10, X 4  2 Lift 1

13 Probabilistic Decision Trees p(S | M=m 0, X 1 =1, X 2 =2) p(S | M, X 1, X 2 )

14 X 2 MX 1 M M p(S=subscribed) = 0.6 p(S=not subscribed) = 0.4 2 1,3 mailed not mailed 1 2 p(S=subscribed) = 0.5 p(S=not subscribed) = 0.5 p(S=subscribed) = 0.4 p(S=not subscribed) = 0.6 p(S=subscribed) = 0.2 p(S=not subscribed) = 0.8 mailed not mailed not mailed p(S=subscribed) = 0.7 p(S=not subscribed) = 0.3 p(S=subscribed) = 0.3 p(S=not subscribed) = 0.7 Calculating Lift Potential customer with {X 1 =1, X 2 =2}, Assume c = 0.50, r = 9 Lift = -0.5 + (0.4 – 0.2)  9 = 1.3 Mail to this person!

15 Traditional Learning Algorithm X1X1 Score 1 (Data) X2X2 Score 2 (Data) XnXn Score n (Data) X2X2 X2X2 X1X1 Score 1 (Data) X2X2 X3X3 Score 3 (Data) X2X2 XnXn Score n (Data)

16 Lift-Aware Learning Algorithm Traditional Learning Algorithm Identify a tree that represents p(S|M,X) well Lift-Aware Would like the tree to be good at modeling the difference: p(S=s 1 |M=m 1,X=x) - p(S=s 1 |M=m 0,X=x)

17 A Heuristic Only consider decision trees (for S) with the last split on M M X1X1 MM X1X1 MM Score 1 (Data) XnXn MM Score n (Data) X1X1 M Score 2 (Data) X2X2 MM X1X1 M X2X2 MM

18 Experiment: Real-world Dataset Product of interest: MSN subscription Potential customers: Windows 95 registrants Known variables (X):15 from questionnaire (e.g. gender, RAM size) Cost to Mail:42 cents Subscription revenue:varied from 1 to 15 dollars Data:sample of ~110,000 potential customers (70% train, 30% test) Compared our algorithm (FORCE) with unconstrained greedy algorithm (NORMAL) for various revenues

19 Results on Test Data: Per-person improvement over Mail-to-All

20 Conclusions / Future Work Marginal improvement over standard decision-tree algorithm: Almost every path in the “standard” trees contained a split on M. We expect larger difference for other domains. Algorithm works for discounted prices: Expected Profit from mailing Profit from not mailing

21 Part II: Targeted Advertising on the Web Given information about a visitor, how do you choose which advertisement to display? ???

22 Goals of Targeted Advertising Maximize $$$ Maximize Clicks Brand Presence

23 Naïve Targeting Scheme Possible cluster attributes: Current page category Pages the user has visited on the site Known demographics Inferred demographics Previous advertisement clicks Cluster 1Cluster m Step 1: cluster / segment users

24 Naïve Targeting Scheme Step 2: Advertiser books ads into clusters Step 3: Measure click probabilities Step 4: Show best ad to each cluster Problems: (Inventory management) Ad Quotas Cluster overbooking

25 Advertisement Allocation Cluster 1Cluster m Ad 1 Ad 2 Ad n x 11 x 21 xn1 xn1 x1mx1m x2mx2m x nm Cluster 2 x 12 x 22 xn2xn2 x ij = Number of times to show advertisement i to user cluster j

26 Maximize Expected Clicks Cluster 1Cluster m Ad 1 Ad 2 Ad n p 11  x 11 p 21  x 21 pn1  xn1pn1  xn1 p1m  x1mp1m  x1m p2m  x2mp2m  x2m p nm  x nm Cluster 2 p 12  x 12 p 22  x 22 pn2  xn2pn2  xn2

27 Inventory-Management Constraints Ad i xi1xi1 x im Cluster j x ij xi1xi1 x in

28 Linear Program Find the schedule X that maximizes: Subject to: Solve using (e.g.) the simplex algorithm

29 A Simple Targeting System Estimate probabilities Find the optimal schedule Serve ads to cluster j via

30 Sensitivity to Estimates Cluster 1 Ad 1 Ad 2 0.49 0.51 Cluster 2 0.51 0.49 q 1 = q 2 = c 1 = c 2 =k Cluster 1 Ad 1 Ad 2 0 k Cluster 2 k 0 Probabilities: Optimal Schedule:

31 Solution: Buckets Cluster 1 Ad 1 Ad 2 0.5 Cluster 2 0.5 q 1 = q 2 = c 1 = c 2 =k Cluster 1 Ad 1 Ad 2 a c Cluster 2 b d Probabilities: Optimal Schedule: a+b+c+d = 2k Secondary (linear) optimization: Ads are shown as close to uniform across all clusters

32 Passive Experiment: MSNBC (December 1998) Sports News Health Opinion  Clusters defined by the current page group Manual approach: advertisers buy impressions on page groups

33 ~20 clusters ~500 advertisements ~1.6 million impressions / day Passive Experiment: MSNBC (December 1998) Data from day 1: Estimate p ij (ave ~4K data points per probability) Find optimal schedule (less than 1 minute – no buckets) Data from day 2: Re-estimate p ij Evaluate schedule: Result: 20 – 30 % increase over manual schedule

34 Particular advertiser: 5 ads Data from weekend 1: Estimate p ij (~15K data points per probability) Find optimal schedule (less than 1 second using buckets) Rearrange advertisements for weekend 2 Data from weekend 2: Count the number of clicks and compare to weekend 1 Active Experiment on MSNBC (May 1999)

35 0 advertisercontrol Weekend 1 (pre target) Weekend 2 (post target) 30% increase for the advertiser, negligible increase for others Predicted a 20% increase on MSNBC Active Experiment Results

36 Extensions Problem: Increasing total expected clicks across site may decrease clicks for particular advertiser Solution: Add (linear) constraint that expected clicks cannot decrease Passive experiment: MSNBC overall increase still ~20%

37 Extensions Focus of talk: p ij = expected #clicks from showing ad i to user j In general: u ij = expected utility from showing ad i to user j Expected utility of X = Alternative u ij choices Weighted probabilities: w i p ij Probability of purchase Increase in brand awareness Expected revenue

38 My Home Page http://research.microsoft.com/~dmax/

39

40 Results on Test Data: Per-person improvement over Mail-to-All To evaluate test case given a model: Evaluate the lift given X (ignoring M and S) Recommend Mail if and only if Lift > 0 If recommendation matches M from the test case, add r to the total revenue. Otherwise, ignore.


Download ppt "Probabilistic Methods for Targeted Advertising Max Chickering Microsoft Research."

Similar presentations


Ads by Google