Multi Armed Bandits chalpert@meetup.com.

Slides:



Advertisements
Similar presentations
University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.
Advertisements

Solve this equation to find x
Investor Education in Your Workplace®
Bayes rule, priors and maximum a posteriori
Fill in missing numbers or operations
Name: Date: Read temperatures on a thermometer Independent / Some adult support / A lot of adult support
Reinforcement Learning
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
FDA/Industry Workshop September, 19, 2003 Johnson & Johnson Pharmaceutical Research and Development L.L.C. 1 Uses and Abuses of (Adaptive) Randomization:
/4/2010 Box and Whisker Plots Objective: Learn how to read and draw box and whisker plots Starter: Order these numbers.
Expert Elicitation A Worked Example. Avoiding Bias Basing judgement on all knowledge and consider all available relevant evidence; Consider the widest.
Copyright © 2010 Pearson Education, Inc. Slide
1  1 =.
Team 1 Team Guidelines Slide 1 – Numbers are hyperlinked to slides, you can edit the information on each slide – to include.
Evaluating Provider Reliability in Risk-aware Grid Brokering Iain Gourlay.
Correctness of Gossip-Based Membership under Message Loss Maxim GurevichIdit Keidar Technion.
Vote Elicitation with Probabilistic Preference Models: Empirical Estimation and Cost Tradeoffs Tyler Lu and Craig Boutilier University of Toronto.
CHAPTER 5 REVIEW.
Preview Warm Up California Standards Lesson Presentation.
Independent Demand Inventory Systems
Sampling Distributions
Discrete Distributions
Chapter 4: Basic Estimation Techniques
Chapter 16: Random Variables
TAU Agent Team: Yishay Mansour Mariano Schain Tel Aviv University TAC-AA 2010.
Virtual Network Embedding with Coordinated Node and Link Mapping N. M. Mosharaf Kabir Chowdhury Muntasir Raihan Rahman and Raouf Boutaba University of.
Elementary Statistics
Samples The means of these samples
Decision Analysis (Decision Trees) Y. İlker TOPCU, Ph.D twitter.com/yitopcu.
Price Points With Bar Charts. Investment to Value Ratio $45,000 - $65,000 $25,000 - $45,000 Under $25,000 $90,000 - $125,000 $65,000 - $90,000 Low Price.
Percent of change can be applied to sales tax and discounts. Sales tax is a tax that is added to the cost of an item. Is sales tax an example of a percent.
EXAMPLE 4 Solve a multi-step problem SHOPPING
Cost-Volume-Profit Relationships
June 4, 2004 A Robust Reputation System for P2P and Mobile Ad-hoc Networks Sonja Buchegger 1 A Robust Reputation System for P2P and Mobile Ad-hoc Networks.
Inflation Unit Chapter 2 26
Project Scheduling: PERT-CPM
Thomas Jellema & Wouter Van Gool 1 Question. 2Answer.
COURSE: JUST 3900 TIPS FOR APLIA Developed By: Ethan Cooper (Lead Tutor) John Lohman Michael Mattocks Aubrey Urwick Chapter 5: z-Scores.
Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005.
Happy Wednesday! April 18, What is a probability distribution? -A table or an equation that links each outcome of a statistical experiment with.
CHAPTER 2 – DISCRETE DISTRIBUTIONS HÜSEYIN GÜLER MATHEMATICAL STATISTICS Discrete Distributions 1.
Lecture 18: Temporal-Difference Learning
Adding Up In Chunks.
Categorical variable We need a frequency table, preferably with values expressed as percentages, to summarise the values of the variable. We also need.
Proportions and Percents Unit rates & Proportions Unit Rate Scale Drawing and Probability Fractions Percents.
Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
10b.1 Van Horne and Wachowicz, Fundamentals of Financial Management, 13th edition. © Pearson Education Limited Created by Gregory Kuhlemeyer. Chapter.
Number bonds to 10,
Kyle bought a bike from his friend. His friend gave him a 20% discount. He paid $40 for it. How much was the original price of the bike?
Basics of Statistical Estimation
Let’s Add! Click the cloud below for a secret question! Get Started!
Where do data come from and Why we don’t (always) trust statisticians.
New Opportunities for Load Balancing in Network-Wide Intrusion Detection Systems Victor Heorhiadi, Michael K. Reiter, Vyas Sekar UNC Chapel Hill UNC Chapel.
Constraint Optimization We are interested in the general non-linear programming problem like the following Find x which optimizes f(x) subject to gi(x)
Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.
Minimaxity & Admissibility Presenting: Slava Chernoi Lehman and Casella, chapter 5 sections 1-2,7.
Samples vs. Distributions Distributions: Discrete Random Variable Distributions: Continuous Random Variable Another Situation: Sample of Data.
A quick intro to Bayesian thinking 104 Frequentist Approach 10/14 Probability of 1 head next: = X Probability of 2 heads next: = 0.51.
Reinforcement Learning Evaluative Feedback and Bandit Problems Subramanian Ramamoorthy School of Informatics 20 January 2012.
Vasilis Syrgkanis Microsoft Research, NYC 1 Joint with:Yishay Mansour (Microsoft Research and Tel-Aviv Univ.) Aleksandrs Slivkins (Microsoft Research)
Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.
Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.
Basics of Multi-armed Bandit Problems
Figure 5: Change in Blackjack Posterior Distributions over Time.
Improving Monte Carlo Tree Search Policies in StarCraft
Adaptive, Personalized Diversity for Visual Discovery
Bandits for Taxonomies: A Model-based Approach
Shunan Zhang, Michael D. Lee, Miles Munro
Presentation transcript:

Multi Armed Bandits chalpert@meetup.com

Survey

Click Here

Click-through Rate (Clicks / Impressions) 20% Click Here Click-through Rate (Clicks / Impressions) 20%

Click Here Click Here

Click Here Click Here Click-through Rate 20% ?

AB Test Randomized Controlled Experiment Show each button to 50% of users Click Here Click Here Click-through Rate 20% ?

After Test (show winner) AB Test Timeline Time Exploration Phase (Testing) Exploitation Phase (Show Winner) Before Test AB Test After Test (show winner)

Click Here Click Here Click-through Rate 20% ?

Click Here Click Here Click-through Rate 20% 30%

10,000 impressions/month Need 4,000 clicks by EOM 30% CTR won’t be enough

Need to keep testing (Exploration)

Each variant would be assigned with probability 1/N Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here ABCDEFG... Test Each variant would be assigned with probability 1/N N = # of variants Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here

Not everyone is a winner

Each variant would be assigned with probability 1/N Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here ABCDEFG... Test Each variant would be assigned with probability 1/N N = # of variants Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here Click Here

Need to keep testing (Exploration) Need to minimize regret (Exploitation)

Balance of Exploitation & Exploration Multi Armed Bandit Balance of Exploitation & Exploration

Bandit Algorithm Balances Exploitation & Exploration Time Discrete Exploitation & Exploration Phases Before Test AB Test After Test Continuous Exploitation & Exploration Before Test Multi Armed Bandit Bandit Favors Winning Arm

Bandit Algorithm Reduces Risk of Testing AB Test Best arm exploited with probability 1/N More Arms: Less exploitation Bandit Best arm exploited with determined probability Reduced exposure to suboptimal arms

Borrowed from Probabilistic Programming & Bayesian Methods for Hackers Demo Borrowed from Probabilistic Programming & Bayesian Methods for Hackers

AB test would have cost 4.3 percentage points Split Test Still sending losers Bandit AB test would have cost 4.3 percentage points Winner Breaks Away!

How it works Epsilon Greedy Algorithm ε = Probability of Exploration Click Here Exploration ε 1 / N ε / N Start of round Click Here Epsilon Greedy with ε = 1 = AB Test 1 - ε 1-ε Exploitation (show best arm) Click Here

Epsilon Greedy Issues Constant Epsilon: No prior knowledge Initially under exploring Later over exploring Better if probability of exploration decreases with sample size (annealing) No prior knowledge

Some Alternatives Epsilon-First Epsilon-Decreasing Softmax UCB (UCB1, UCB2) Bayesian-UCB Thompson Sampling (Bayesian Bandits)

Bandit Algorithm Comparison Regret:

Thompson Sampling Setup: Assign each arm a Beta distribution with parameters (α,β) (# Success, # Failures) Beta(α,β) Beta(α,β) Beta(α,β) Click Here Click Here Click Here

Thompson Sampling Setup: Initialize priors with ignorant state of Beta(1,1) (Uniform distribution) - Or initialize with an informed prior to aid convergence Beta(1,1) Beta(1,1) Beta(1,1) Click Here Click Here Click Here

Thompson Sampling For each round: 1: Sample random variable X from each arm’s Beta Distribution 2: Select the arm with largest X 3: Observe the result of selected arm 4: Update prior Beta distribution for selected arm Success! X 0.7 0.2 0.4 Beta(1,1) Beta(1,1) Beta(1,1) Click Here Click Here Click Here

Thompson Sampling For each round: 1: Sample random variable X from each arm’s Beta Distribution 2: Select the arm with largest X 3: Observe the result of selected arm 4: Update prior Beta distribution for selected arm Success! X 0.7 0.2 0.4 Beta(2,1) Beta(1,1) Beta(1,1) Click Here Click Here Click Here

Thompson Sampling For each round: 1: Sample random variable X from each arm’s Beta Distribution 2: Select the arm with largest X 3: Observe the result of selected arm 4: Update prior Beta distribution for selected arm Failure! X 0.4 0.8 0.2 Beta(2,1) Beta(1,1) Beta(1,1) Click Here Click Here Click Here

Thompson Sampling For each round: 1: Sample random variable X from each arm’s Beta Distribution 2: Select the arm with largest X 3: Observe the result of selected arm 4: Update prior Beta distribution for selected arm Failure! X 0.4 0.8 0.2 Beta(2,1) Beta(1,2) Beta(1,1) Click Here Click Here Click Here

Posterior after 100k pulls (30 arms)

Bandits at Meetup

Meetup’s First Bandit

Control: Welcome To Meetup. - 60% Open Rate Winner: What Control: Welcome To Meetup! - 60% Open Rate Winner: What? Winner: Hi - 75% Open Rate (+25%) 76 Arms

Control: Welcome To Meetup. - 60% Open Rate Winner: What Control: Welcome To Meetup! - 60% Open Rate Winner: What? Winner: Hi - 75% Open Rate (+25%) 76 Arms

Control: Welcome To Meetup. - 60% Open Rate Winner: What Control: Welcome To Meetup! - 60% Open Rate Winner: What? Winner: Hi - 75% Open Rate (+25%) 76 Arms

Avoid Linkbaity Subject Lines

Coupon Email 16 Arms Control: Save 50%, start your Meetup Group – 42% Open Rate Winner: Here is a coupon – 53% Open Rate (+26%)

398 Arms

210% Click-through Difference: Best: Looking to start the perfect Meetup for you? We’ll help you find just the right people Start the perfect Meetup for you! Worst: Launch your own Meetup in January and save 50% Start the perfect Meetup for you 50% off promotion ends February 1st.

Choose the Right Metric of Success Success tied to click in last experiment Sale end & discount messaging had bad results Perhaps people don’t know that hosting a Meetup costs $$$? Better to tie success to group creation

More Issues Email open & click delay New subject line effect Problem when testing notifications Monitor success trends to detect weirdness

Seasonality Thompson Sampling should naturally adapt to seasonal changes Learning rate can be added for faster adaptation Winner all other times Click Here Click Here

Bandit or Split Test? AB Test good for: Bandit good for: - Biased Tests - Complicated Tests Bandit good for: - Unbiased Tests - Many Variants - Time Restraints - Set It And Forget It

Thanks! chalpert@meetup.com