Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bandit’s Paradise: The Next Generation of Test-and-Learn Marketing

Similar presentations


Presentation on theme: "Bandit’s Paradise: The Next Generation of Test-and-Learn Marketing"— Presentation transcript:

1 Bandit’s Paradise: The Next Generation of Test-and-Learn Marketing
Professor Peter Fader The Wharton School, University of Pennsylvania Co-director, Wharton Customer Analytics Initiative Joint work with Eric Schwartz and Eric Bradlow

2 STARTING POINT: A/B(/C/D/E…) Testing
Randomly divide customers into two (or more groups) Expose each group to different advertisements Measure the outcomes for each group Compare outcomes using confidence intervals

3 Multivariate testing (MVT)
Multivariate testing is simply testing multiple features of your advertising in the same test You should use a multivariate test when: You want to know the relative effects of the different features You think there may be interactions between the features By making some reasonable assumptions about the interactions, you can work out the effect for each feature with fewer customers and without testing all combinations of features.

4 “State of the art” test-and-learn
Adaptive testing “State of the art” test-and-learn Run A/B, A/B/n, or multivariate testing (MVT) Crown the winner Earn-and-learn Make profit while learning Continuous improvement Extensions and natural complications Large set of candidate ads to compare Very rare events (e.g., acquisitions via display ads) Batches of decisions (e.g., “chunky” allocations) Different contexts (e.g., websites differ) I’m going to use the language on online display advertising – with focus on customer acquisition – to make things tangible. Contexts: you’ve bought eh media in advance

5 Which Ad WILL Bring in the Most Customers?
You might just run an A/B/C test. But are we really sure that we can crown the winner? How (and when) do we know? And there’s clearly an attribute structure here. But this is just a small sample…

6 How should we allocate impressions across many ads (served on many websites) to acquire more customers? This is the real experiment MVT could take a long long time…

7 typical analytics for experiments
Snapshot from a well-established online testing service. But it doesn’t directly inform action.

8 Introducing the multi-armed bandit
$$$ 50+years old. Even applied to advertising. But reality is more complicated.

9 Broader class of earn-and-Learn problems

10 Field experiment: summary and scope
4 ad concepts and 3 ad sizes

11 This is the real experiment
MVT could take a long long time…

12 Field experiment: summary and scope
4 ad concepts and 3 ad sizes 80+ media placements (including websites, portals, ad networks, ad exchanges) 500+ million impressions Conversion rates in line with industry standards (between 1 and 10 out of 1 million impressions)

13 Field experiment: Time Line
Value V Estimate Act E A Initialize Time of the experiment Schwartz AMA 2012

14 Managing the Multi-Armed Bandit
Static, balanced design (equal allocation) Adaptive, “greedy” methods (winner take all) Adaptive, randomized (smooth allocation) Agarwal et al. 2008; Auer et al. 2002; Bertsimas and Mersereau 2007; Lai 1987; Scott 2010; Rumsmevichientong and Tsitsiklis 2010 Gonul and Shi 1998; Gonul and ter Hosftede 2006; Hauser et al. 2009; Montoya et al. 2011; Simester et al. 2006; Sun et al. 2006 Schwartz AMA 2012

15 Formalizing the optimization problem
Hierarchical attribute-based K-armed bandit with J contexts and batching Objective function Bellman equation But this can’t be solved directly…

16 How is the optimization problem solved?
For independent actions, no heterogeneity, one-at-a-time decisions … The Gittins Index is the optimal certainty equivalent of an uncertain arm In other words, it perfectly reflects the “exploration bonus” for each arm

17 Managing the Multi-Armed Bandit
Winner-take-all policies Observed mean (“Greedy”) Gittins Index Conversion rate (scale masked) Observed Mean Allocations of impressions across ads Gittins Index “Exploration bonus” Why should we give up on A & C? And should we truly/always give up on B? Advertisements A B C D Advertisements

18 How is the optimization problem solved?
For the standard sequential optimization problem The Gittins Index is the optimal certainty equivalent of an uncertain arm In other words, it perfectly reflects the “exploration bonus” for each arm But that’s not our problem… Actions are not independent; they are described by attributes Attribute structure across actions improves learning Doesn’t account for heterogeneity Doesn’t say anything about batching

19 Managing the Multi-Armed Bandit
Randomized Probability Matching Conversion rate (scale masked) Allocate resources to each action in proportion to the probability that is the best action. Allocations of impressions across ads B brought in no conversions in this particular time period, but do we really believe it? No! Let’s focus on the underlying structure which generates the probabilities – not just the observed outcomes. Advertisements A B C D Advertisements Berry 2004; Chapelle and Li 2012;Granmo 2010; May et al 2011; Scott 2010; Thompson 1933.

20 The benefits of Randomized probability matching
Explore/Exploit is achieved by sampling from full distribution for each action RPM is asymptotically optimal in maximizing cumulative reward (i.e., loss shrinks in log time) Attribute structure easy to accommodate in a standard hierarchical logit model of conversions:

21 Field experiment: Time Line
Value V Estimate Act E A Initialize Time of the experiment Schwartz AMA 2012

22 Field experiment: implementation details
Timing Update every 6 days for 61 days in 2012 RPM allocation probabilities are “rotation weights” Receive data and upload weights directly to Google DoubleClick DART (Dynamic Advertising Reporting and Targeting)

23 Field experiment: basic results
But significant heterogeneity across websites Conversion rate indexed as percent of average Ad A Ad B Ad C Ad D Tall 160x600 117 88 99 176 Square 300x250 107 72 151 114 Wide 728x90 115 92 66 All Sizes 112 80 100 105

24 Field experiment: Adaptive versus Not Adaptive
8% About 2000 question acquired overall, so about 150 incremental due to RPM Adaptive: Experiment with changing weights (RPM) Not Adaptive: Static balanced experiment

25 Simulation study

26 Total reward distribution

27 Cumulative conversion rate over time

28 Proportion of impressions for “best ad”

29 What should you take away?
View interactive marketing problems as adaptive experiments Consider the exploration-exploitation tradeoff when facing uncertainty Use the multi-armed bandit as a logical framework for testing – and randomized probability matching as the way to solve it Test and learn… profitably! Earn and learn!

30 THANK YOU faderp@wharton.upenn.edu http://www.petefader.com/
@whartonCAI


Download ppt "Bandit’s Paradise: The Next Generation of Test-and-Learn Marketing"

Similar presentations


Ads by Google