Math 6330: Statistical Consulting Class 11

Math 6330: Statistical Consulting Class 11
Tony Cox University of Colorado at Denver Course web site:

Course schedule April 14: Draft of project/term paper due
April 18, 25, May 2: In-class presentations May 2: Last class May 4: Final project/paper due by 8:00 PM

MAB Thompson sampling (cont.)

Thompson sampling and adaptive Bayesian control: Bernoulli trials
Basic idea: Choose each of the k actions according to the probability that it is best Estimate the probability via Bayes’ rule It is the mean of the posterior distribution Use beta conjugate prior updating for “Bernoulli bandit” (0-1 reward, fail/succeed) Sample from posterior for each arm, 1… k; choose the one with highest sample value. Update & repeat. S = success F = failure Agrawal and Goyal, 2012

Thompson sampling: General stochastic (random) rewards
Second idea: Generalize to arbitrary reward distribution (normalized to the interval [0, 1]) by considering a trial a “success” with probability equal to its reward Agrawal and Goyal, 2012

Thompson sampling with complex online actions
Main idea: Embed simulation-optimization in Thompson sampling loop  = state space, S Sample the states Applications: Job scheduling (assigning jobs to machines); web advertising with reward depending sets of ads shown Y = observation h = reward, X = random variable depending on  Updating posteriors can be done efficiently using a sampling-based approach (particle filtering) Gopalan et al.,

Comparing methods In simulation experiments, Thompson sampling works well with batch updating, even with slowly or occasionally changing rewards and other realistic complexities. Beats UCB1 in many but not all comparisons More practical than UCB1 for batch updating because it keeps experimenting (trying actions with some randomness) between updates.

MAB variations Contextual bandits Adversarial bandits
See signal before acting Constrained contextual bandits: Actions constrained Adversarial bandits Adaptive adversaries Bubeck and Slivens, 2012, Restless bandits: Probabilities change Gittins index maximizes expected discounted reward, not easy to compute Correlated bandits

Wrap-up on MAB problems
Adaptive Bayesian learning works well in simple environments, including many of practical interest The resulting rules are *much* simpler to implement than previous methods (e.g., Gittins index policies) Sampling-based approaches (Thomposn, particle filtering, etc.) promote computationally practical “online learning”

Wrap-up on adaptive learning
No need for a causal model Learn act-consequence probabilities and optimal decision rules directly Assumes a stationary (or slowly changing) decision environment, known choice set, immediate feedback (reward) following action Works very well when these assumptions are met: low-regret learning is possible

Optimal stopping

Optimal stopping decision problems
Suppose that a decision-maker (d.m.) faces a random sequence of opportunities How long to wait for best one? When to stop and commit to a final choice? Examples: Selling a house, hiring a new employee, accepting a job offer, replacing a component, shuttering an aging facility, taking a parking spot, etc. Other optimal stopping problems: Least-cost policies for replacing aging components

Hazard functions: Conditional rate of failure given survival so far
Let T = length of life for a component (or person, or time until first occurrence of an event, etc.) T is a random variable with cdf F(t) = Pr(T < t) and survival function S(t) = 1 – F(t) = Pr(T > t) The pdf for T is then f(t) = F’(t) = dF(t)/dt The hazard function for T is defined as: h(t) = limdt0Pr(t < T < t + dt | T > t)/dt h(t) = f(t)/S(t) = f(t)/[1 – F(t)] Interpretation: “instantaneous failure rate” h(t)dt  Pr(occurs in next dt | survival until t) In discrete time, dt = 1, no limit is taken

Using hazard functions to guide decisions
The shape of the hazard function can often guide decisions, e.g… If h(t) is increasing, then optimal time to stop is when h(t) reaches a certain threshold If h(t) is decreasing, then best decision is either don’t start or else continue until failure occurs Normal distribution hazard function calculator is at SPRT and other calculators:

Example: optimal age replacement
The lifetime T of a component is a random variable with known distribution Suppose it costs $10 to replace the plant before it fails and $50 to replace it if it fails. When should the component be voluntarily replaced (if not failed yet)? Answer can be calculated by minimizing expected average cost per cycle (or equating marginal benefit to marginal cost for continuing), but calculations are detailed and soon get tedious Alternative: Google “optimal replacement age calculator”

Optimal age replacement calculator

Optimal selling of an asset
If offers arrive sequentially from a known distribution and costs of waiting are known, then an optimal decision boundary (blue) can be constructed to maximize EMV Sell when red line first hits blue decision boundary W(t) = price series S(t) = maximum price so far

Optimal stopping: Variations
Offers arrive sequentially from an unknown distribution Bayesian updating provides solutions Time pressure: Must sell by deadline, or fixed number of offers With or without being able to go back to previous offers Sell when blue line first hits green decision boundary

Wrap-up on optimal stopping and statistical decision theory
Many valuable decision problems can be solved using the philosophy of simulation-optimization: Try different decisions, evaluate their probable consequences Choose the one with best (EMV or EU-maximizing) probability distribution of consequences Finding a best decision or decision rule can become very technical Use appropriate software or on-line calculators For business applications, understanding how to formulate decision problems and solve them with software can create high value in practice

Heuristics and biases

Math 6330: Statistical Consulting Class 11

Similar presentations

Presentation on theme: "Math 6330: Statistical Consulting Class 11"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Math 6330: Statistical Consulting Class 11

Similar presentations

Presentation on theme: "Math 6330: Statistical Consulting Class 11"— Presentation transcript:

Similar presentations

About project

Feedback