Math 6330: Statistical Consulting Class 11

Slides:

Advertisements

Similar presentations

Bayes rule, priors and maximum a posteriori

Advertisements

Reliability Engineering (Rekayasa Keandalan)

Managerial Decision Modeling with Spreadsheets

Engineering Economic Analysis Canadian Edition

Planning under Uncertainty

Binomial Random Variables. Binomial experiment A sequence of n trials (called Bernoulli trials), each of which results in either a “success” or a “failure”.

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 2: Evaluative Feedback pEvaluating actions vs. instructing by giving correct.

1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April

Sequential Hypothesis Testing under Stochastic Deadlines Peter Frazier, Angela Yu Princeton University TexPoint fonts used in EMF. Read the TexPoint manual.

An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Asaf Cohen Department of Mathematics University of Michigan Financial Mathematics Seminar University of Michigan September 10,

MAKING COMPLEX DEClSlONS

Search and Planning for Inference and Learning in Computer Vision

Particle Filtering in Network Tomography

Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.

Reinforcement Learning Evaluative Feedback and Bandit Problems Subramanian Ramamoorthy School of Informatics 20 January 2012.

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state.

Crop area estimates with area frames in the presence of measurement errors Elisabetta Carfagna University of Bologna Department.

Statistical Decision Theory

Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Engineering Economic Analysis Canadian Edition

- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.

Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.

Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.

Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.

Bayesian Prior and Posterior Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Nov. 24, 2000.

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

Machine Learning 5. Parametric Methods.

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.

Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)

Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.

Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Bayesian Estimation and Confidence Intervals Lecture XXII.

Math 6330: Statistical Consulting Class 9

Virtual University of Pakistan

Decision Analysis Lecture 11

Expectations of Random Variables, Functions of Random Variables

Decision Analysis Lecture 7

Lecture 1.31 Criteria for optimal reception of radio signals.

Computer Simulation Henry C. Co Technology and Operations Management,

Renewal Theory Definitions, Limit Theorems, Renewal Reward Processes, Alternating Renewal Processes, Age and Excess Life Distributions, Inspection Paradox.

Figure 5: Change in Blackjack Posterior Distributions over Time.

Optimal Stopping.

Prepared by Lloyd R. Jaisingh

Heuristic Optimization Methods

Continuous Random Variables

Operations Management

Task: It is necessary to choose the most suitable variant from some set of objects by those or other criteria.

Probabilistic Robotics

Markov Decision Processes

Supplement: Decision Making

Chapter 8: Inference for Proportions

Review of Probability and Estimators Arun Das, Jason Rebello

Hidden Markov Models Part 2: Algorithms

Announcements Homework 3 due today (grace period through Friday)

More about Posterior Distributions

Discrete Event Simulation - 4

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

CAP 5636 – Advanced Artificial Intelligence

Statistical NLP: Lecture 4

Chapter 2: Evaluative Feedback

Parametric Methods Berlin Chen, 2005 References:

Kalman Filter: Bayes Interpretation

Mathematical Foundations of BME Reza Shadmehr

Chapter 2: Evaluative Feedback

Applied Statistics and Probability for Engineers

Presentation transcript:

Math 6330: Statistical Consulting Class 11 Tony Cox tcoxdenver@aol.com University of Colorado at Denver Course web site: http://cox-associates.com/6330/

Course schedule April 14: Draft of project/term paper due April 18, 25, May 2: In-class presentations May 2: Last class May 4: Final project/paper due by 8:00 PM

MAB Thompson sampling (cont.)

Thompson sampling and adaptive Bayesian control: Bernoulli trials Basic idea: Choose each of the k actions according to the probability that it is best Estimate the probability via Bayes’ rule It is the mean of the posterior distribution Use beta conjugate prior updating for “Bernoulli bandit” (0-1 reward, fail/succeed) Sample from posterior for each arm, 1… k; choose the one with highest sample value. Update & repeat. S = success F = failure http://jmlr.org/proceedings/papers/v23/agrawal12/agrawal12.pdf Agrawal and Goyal, 2012

Thompson sampling: General stochastic (random) rewards Second idea: Generalize to arbitrary reward distribution (normalized to the interval [0, 1]) by considering a trial a “success” with probability equal to its reward http://jmlr.org/proceedings/papers/v23/agrawal12/agrawal12.pdf Agrawal and Goyal, 2012

Thompson sampling with complex online actions Main idea: Embed simulation-optimization in Thompson sampling loop  = state space, S Sample the states Applications: Job scheduling (assigning jobs to machines); web advertising with reward depending sets of ads shown Y = observation h = reward, X = random variable depending on  Updating posteriors can be done efficiently using a sampling-based approach (particle filtering) Gopalan et al., 2014 http://jmlr.org/proceedings/papers/v32/gopalan14.pdf

Comparing methods In simulation experiments, Thompson sampling works well with batch updating, even with slowly or occasionally changing rewards and other realistic complexities. Beats UCB1 in many but not all comparisons More practical than UCB1 for batch updating because it keeps experimenting (trying actions with some randomness) between updates. http://engineering.richrelevance.com/recommendations-thompson-sampling/

MAB variations Contextual bandits Adversarial bandits See signal before acting Constrained contextual bandits: Actions constrained Adversarial bandits Adaptive adversaries Bubeck and Slivens, 2012, https://www.microsoft.com/en-us/research/wp-content/uploads/2017/01/COLT12_BS.pdf Restless bandits: Probabilities change Gittins index maximizes expected discounted reward, not easy to compute Correlated bandits

Wrap-up on MAB problems Adaptive Bayesian learning works well in simple environments, including many of practical interest The resulting rules are *much* simpler to implement than previous methods (e.g., Gittins index policies) Sampling-based approaches (Thomposn, particle filtering, etc.) promote computationally practical “online learning”

Wrap-up on adaptive learning No need for a causal model Learn act-consequence probabilities and optimal decision rules directly Assumes a stationary (or slowly changing) decision environment, known choice set, immediate feedback (reward) following action Works very well when these assumptions are met: low-regret learning is possible

Optimal stopping

Optimal stopping decision problems Suppose that a decision-maker (d.m.) faces a random sequence of opportunities How long to wait for best one? When to stop and commit to a final choice? Examples: Selling a house, hiring a new employee, accepting a job offer, replacing a component, shuttering an aging facility, taking a parking spot, etc. Other optimal stopping problems: Least-cost policies for replacing aging components

Hazard functions: Conditional rate of failure given survival so far Let T = length of life for a component (or person, or time until first occurrence of an event, etc.) T is a random variable with cdf F(t) = Pr(T < t) and survival function S(t) = 1 – F(t) = Pr(T > t) The pdf for T is then f(t) = F’(t) = dF(t)/dt The hazard function for T is defined as: h(t) = limdt0Pr(t < T < t + dt | T > t)/dt h(t) = f(t)/S(t) = f(t)/[1 – F(t)] Interpretation: “instantaneous failure rate” h(t)dt  Pr(occurs in next dt | survival until t) In discrete time, dt = 1, no limit is taken

Using hazard functions to guide decisions The shape of the hazard function can often guide decisions, e.g… If h(t) is increasing, then optimal time to stop is when h(t) reaches a certain threshold If h(t) is decreasing, then best decision is either don’t start or else continue until failure occurs Normal distribution hazard function calculator is at http://reliabilityanalyticstoolkit.appspot.com/normal_distribution SPRT and other calculators: http://reliabilityanalyticstoolkit.appspot.com/ www.wolfram.com/mathematica/new-in-9/enhanced-probability-and-statistics/define-a-distribution-given-its-hazard-function.html https://www.ncss.com/software/ncss/survival-analysis-in-ncss/

Example: optimal age replacement The lifetime T of a component is a random variable with known distribution Suppose it costs $10 to replace the plant before it fails and $50 to replace it if it fails. When should the component be voluntarily replaced (if not failed yet)? Answer can be calculated by minimizing expected average cost per cycle (or equating marginal benefit to marginal cost for continuing), but calculations are detailed and soon get tedious Alternative: Google “optimal replacement age calculator”

Optimal age replacement calculator http://www.reliawiki.org/index.php/Optimum_Replacement_Time_Example

Optimal selling of an asset If offers arrive sequentially from a known distribution and costs of waiting are known, then an optimal decision boundary (blue) can be constructed to maximize EMV Sell when red line first hits blue decision boundary W(t) = price series S(t) = maximum price so far http://file.scirp.org/Html/9-1040163_25151.htm

Optimal stopping: Variations Offers arrive sequentially from an unknown distribution Bayesian updating provides solutions Time pressure: Must sell by deadline, or fixed number of offers With or without being able to go back to previous offers Sell when blue line first hits green decision boundary http://file.scirp.org/Html/9-1040163_25151.htm

Wrap-up on optimal stopping and statistical decision theory Many valuable decision problems can be solved using the philosophy of simulation-optimization: Try different decisions, evaluate their probable consequences Choose the one with best (EMV or EU-maximizing) probability distribution of consequences Finding a best decision or decision rule can become very technical Use appropriate software or on-line calculators For business applications, understanding how to formulate decision problems and solve them with software can create high value in practice

Heuristics and biases