Download presentation

Presentation is loading. Please wait.

1
Multi Armed Bandits

2
Survey

3
Click Here

4
Click-through Rate (Clicks / Impressions) 20%

5
Click Here

6
Click-through Rate 20% ?

7
Click Here Click-through Rate 20% ? AB Test Randomized Controlled Experiment Show each button to 50% of users

8
AB Test Timeline AB Test After Test (show winner)Before Test Time Exploration Phase (Testing) Exploitation Phase (Show Winner)

9
Click Here Click-through Rate 20% ?

10
Click Here Click-through Rate 20% 30%

11
10,000 impressions/month Need 4,000 clicks by EOM 30% CTR wont be enough

12
Need to keep testing (Exploration)

13

14
Click Here ABCDEFG... Test Each variant would be assigned with probability 1/N N = # of variants

15
Not everyone is a winner

16
Click Here ABCDEFG... Test Each variant would be assigned with probability 1/N N = # of variants

17
Need to keep testing (Exploration) Need to minimize regret (Exploitation)

18
Multi Armed Bandit Balance of Exploitation & Exploration

19
Bandit Algorithm Balances Exploitation & Exploration Multi Armed BanditBefore Test Time AB Test After TestBefore Test Discrete Exploitation & Exploration Phases Continuous Exploitation & Exploration Bandit Favors Winning Arm

20
Bandit Algorithm Reduces Risk of Testing AB Test Best arm exploited with probability 1/N – More Arms: Less exploitation Bandit Best arm exploited with determined probability – Reduced exposure to suboptimal arms

21
Demo Borrowed from Probabilistic Programming & Bayesian Methods for Hackers

22

23
Split Test Bandit Winner Breaks Away! Still sending losers AB test would have cost 4.3 percentage points

24
How it works Epsilon Greedy Algorithm ε = Probability of Exploration ε 1 - ε Exploration Exploitation (show best arm) Start of round 1 / N Click Here ε / N 1-ε ε / N Epsilon Greedy with ε = 1 = AB Test

25
Epsilon Greedy Issues Constant Epsilon: – Initially under exploring – Later over exploring – Better if probability of exploration decreases with sample size (annealing) No prior knowledge

26
Some Alternatives Epsilon-First Epsilon-Decreasing Softmax UCB (UCB1, UCB2) Bayesian-UCB Thompson Sampling (Bayesian Bandits)

27
Bandit Algorithm Comparison Regret:

28
Thompson Sampling Setup: Assign each arm a Beta distribution with parameters (α,β) (# Success, # Failures) Click Here Beta(α,β) Beta(α,β)Beta(α,β)

29
Thompson Sampling Setup: Initialize priors with ignorant state of Beta(1,1) (Uniform distribution) - Or initialize with an informed prior to aid convergence Click Here Beta(1,1) Beta(1,1)Beta(1,1)

30
For each round: Thompson Sampling Click Here Beta(1,1) Beta(1,1)Beta(1,1) 1: Sample random variable X from each arms Beta Distribution 2: Select the arm with largest X 3: Observe the result of selected arm 4: Update prior Beta distribution for selected arm X Success!

31
For each round: Thompson Sampling Click Here Beta(2,1) Beta(1,1)Beta(1,1) 1: Sample random variable X from each arms Beta Distribution 2: Select the arm with largest X 3: Observe the result of selected arm 4: Update prior Beta distribution for selected arm X Success!

32
For each round: Thompson Sampling Click Here Beta(2,1) Beta(1,1)Beta(1,1) 1: Sample random variable X from each arms Beta Distribution 2: Select the arm with largest X 3: Observe the result of selected arm 4: Update prior Beta distribution for selected arm X Failure!

33
For each round: Thompson Sampling Click Here Beta(2,1) Beta(1,2)Beta(1,1) 1: Sample random variable X from each arms Beta Distribution 2: Select the arm with largest X 3: Observe the result of selected arm 4: Update prior Beta distribution for selected arm X Failure!

34

35
Posterior after 100k pulls (30 arms)

36
Bandits at Meetup

37
Meetups First Bandit

38
Control: Welcome To Meetup! - 60% Open Rate Winner: What? Winner: Hi - 75% Open Rate (+25%) 76 Arms

39
Control: Welcome To Meetup! - 60% Open Rate Winner: What? Winner: Hi - 75% Open Rate (+25%) 76 Arms

40
Control: Welcome To Meetup! - 60% Open Rate Winner: What? Winner: Hi - 75% Open Rate (+25%) 76 Arms

41
Avoid Linkbaity Subject Lines

42
Control: Save 50%, start your Meetup Group – 42% Open Rate Winner: Here is a coupon – 53% Open Rate (+26%) 16 Arms Coupon

43
398 Arms

44

45
210% Click-through Difference: Best: Looking to start the perfect Meetup for you? Well help you find just the right people Start the perfect Meetup for you! Well help you find just the right people Worst: Launch your own Meetup in January and save 50% Start the perfect Meetup for you 50% off promotion ends February 1 st.

46
Choose the Right Metric of Success Success tied to click in last experiment Sale end & discount messaging had bad results Perhaps people dont know that hosting a Meetup costs $$$? – Better to tie success to group creation

47
More Issues open & click delay New subject line effect – Problem when testing notifications Monitor success trends to detect weirdness

48
Seasonality Thompson Sampling should naturally adapt to seasonal changes – Learning rate can be added for faster adaptation Click Here Winner all other times Click Here

49
Bandit or Split Test? AB Test good for: - Biased Tests - Complicated Tests Bandit good for: - Unbiased Tests - Many Variants - Time Restraints - Set It And Forget It

50
Thanks!

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google