Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering.

Slides:



Advertisements
Similar presentations
R&D Portfolio Optimization One Stage R&D Portfolio Optimization with an Application to Solid Oxide Fuel Cells Lauren Hannah 1, Warren Powell 1, Jeffrey.
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Probabilistic models Haixu Tang School of Informatics.
Markov Decision Process
Monte Carlo Methods and Statistical Physics
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
Tuning bandit algorithms in stochastic environments The 18th International Conference on Algorithmic Learning Theory October 3, 2007, Sendai International.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Sampling: Final and Initial Sample Size Determination
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
© 2008 Warren B. Powell 1. Optimal Learning Informs TutORials October, 2008 Warren Powell Peter Frazier Princeton University © 2008 Warren B. Powell, Princeton.
Beam Sensor Models Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read.
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
Visual Recognition Tutorial
Probability Theory Part 2: Random Variables. Random Variables  The Notion of a Random Variable The outcome is not always a number Assign a numerical.
Job Release-Time Design in Stochastic Manufacturing Systems Using Perturbation Analysis By: Dongping Song Supervisors: Dr. C.Hicks & Dr. C.F.Earl Department.
AN INTRODUCTION TO PORTFOLIO MANAGEMENT
Sequential Hypothesis Testing under Stochastic Deadlines Peter Frazier, Angela Yu Princeton University TexPoint fonts used in EMF. Read the TexPoint manual.
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Exploration and Exploitation Strategies for the K-armed Bandit Problem by Alexander L. Strehl.
Nonmyopic Active Learning of Gaussian Processes An Exploration – Exploitation Approach Andreas Krause, Carlos Guestrin Carnegie Mellon University TexPoint.
Maximum Likelihood (ML), Expectation Maximization (EM)
Lattices for Distributed Source Coding - Reconstruction of a Linear function of Jointly Gaussian Sources -D. Krithivasan and S. Sandeep Pradhan - University.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
The Binary Logit Model Definition Characteristics Estimation 0.
Maximum likelihood (ML)
Standard error of estimate & Confidence interval.
Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences Working with nonlinear belief models December 10, 2014 Warren B. Powell Kris Reyes Si Chen.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
Small subgraphs in the Achlioptas process Reto Spöhel, ETH Zürich Joint work with Torsten Mütze and Henning Thomas TexPoint fonts used in EMF. Read the.
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Part 5 Parameter Identification (Model Calibration/Updating)
Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences The knowledge gradient December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton University.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Chapter 5.6 From DeGroot & Schervish. Uniform Distribution.
Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
INFORMS Annual Meeting San Diego 1 HIERARCHICAL KNOWLEDGE GRADIENT FOR SEQUENTIAL SAMPLING Martijn Mes Department of Operational Methods for.
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Statistics Presentation Ch En 475 Unit Operations.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
1 Standard error Estimated standard error,s,. 2 Example 1 While measuring the thermal conductivity of Armco iron, using a temperature of 100F and a power.
© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.
Engineering Statistics ECIV 2305
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
Probability and Moment Approximations using Limit Theorems.
Statistics Presentation Ch En 475 Unit Operations.
Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Unit 4 Review. Starter Write the characteristics of the binomial setting. What is the difference between the binomial setting and the geometric setting?
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
© 2008 Warren B. Powell 1. Optimal Learning Informs TutORials October, 2008 Warren Powell Peter Frazier With research by Ilya Ryzhov Princeton University.
Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences The value of information December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton.
MECH 373 Instrumentation and Measurements
Morgan Bruns1, Chris Paredis1, and Scott Ferson2
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Further Topics on Random Variables: Derived Distributions
Experiments, Outcomes, Events and Random Variables: A Revisit
Further Topics on Random Variables: Derived Distributions
Continuous Random Variables: Basics
Presentation transcript:

Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering Princeton University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A A A A AA

2 Overview Problem Formulation Knowledge Gradient Policy Theoretical Results Numerical Results

3 Measurement Phase Alternative 1 N opportunities to do experiments Alternative M Experimental outcomes Alternative Alternative 2

4 Implementation Phase Alternative 1 Alternative M Reward Alternative 3 Alternative 2

5 Reward Structure On-lineOff-line SequentialX Batch Sampling Taxonomy of Sampling Problems

6 Example 1 One common experimental design is to spread measurements equally across the alternatives. Alternative Quality

7 Example 1 Round Robin Exploration

8 How might we improve round-robin exploration for use with this prior? Example 2

9 Largest Variance Exploration

10 Example 3 Exploitation:

11 Model x n is the alternative tested at time n. Measure the value of alternative x n At time n,, independent Error  n+1 is independent N(0,(   ) 2 ). At time N, choose an alternative. Maximize

12 State Transition At time n measure alternative x n. We update our estimate of based on the measurement Estimates of other Y x do not change.

13 At time n,  n+1 x is a normal random variable with mean  n x and variance satisfying uncertainty about Y x before the measurement uncertainty about Y x after the measurement change in best estimate of Y x due to the measurement n The value of the optimal policy satisfies Bellman’s equation

14 Utility of Information Consider our “utility of information”, and consider the random change in utility due to a measurement at time n

15 Knowledge Gradient Definition The knowledge gradient policy chooses the measurement that maximizes this expected increase in utility,

16 Knowledge Gradient We may compute the knowledge gradient policy via which is the expectation of the maximum of a normal and a constant.

17 Knowledge Gradient The computation becomes where  is the normal cdf,  is the normal pdf, and

18 Optimality Results If our measurement budget allows only one measurement (N=1), the knowledge gradient policy is optimal.

19 Optimality Results The knowledge gradient policy is optimal in the limit as the measurement budget N grows to infinity. This is really a convergence result.

20 Optimality Results The knowledge gradient policy has sub-optimality bounded by where V KG,n gives the value of the knowledge gradient policy and V n the value of the optimal policy.

21 Optimality Results If there are exactly 2 alternatives (M=2), the knowledge gradient policy is optimal. In this case, the optimal policy reduces to

22 Optimality Results If there is no measurement noise and alternatives may be reordered so that then the knowledge gradient policy is optimal.

23 Numerical Experiments 100 randomly generated problems M Uniform {1,...100} N Uniform {M, 3M, 10M}  0 x Uniform [-1,1] (  0 x ) 2 = 1 with probability 0.9 = with probability 0.1   = 1

24 Numerical Experiments

25 Compare alternatives via a linear combination of mean and standard deviation. The parameter z  /2 controls the tradeoff between exploration and exploitation. Interval Estimation

26 KG / IE Comparison

27 KG / IE Comparison Value of KG – Value of IE

28 IE and “Sticking” Alternative 1 is known perfectly

29 IE and “Sticking”

30 Thank You Any Questions?

31 Numerical Example 1

32 Numerical Example 2

33 Numerical Example 3

34 Knowledge Gradient Example

35 Interval Estimation Example

36 Boltzmann Exploration Parameterized by a declining sequence of temperatures (T 0,...T N-1 ).