Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering.

Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering Princeton University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A A A A AA

2 Overview Problem Formulation Knowledge Gradient Policy Theoretical Results Numerical Results

3 Measurement Phase Alternative 1 N opportunities to do experiments Alternative M +1+.2 +1 Experimental outcomes Alternative 3 -.2 Alternative 2

4 Implementation Phase Alternative 1 Alternative M Reward Alternative 3 Alternative 2

5 Reward Structure On-lineOff-line SequentialX Batch Sampling Taxonomy of Sampling Problems

6 Example 1 One common experimental design is to spread measurements equally across the alternatives. Alternative Quality

7 Example 1 Round Robin Exploration

8 How might we improve round-robin exploration for use with this prior? Example 2

9 Largest Variance Exploration

10 Example 3 Exploitation:

11 Model x n is the alternative tested at time n. Measure the value of alternative x n At time n,, independent Error  n+1 is independent N(0,(   ) 2 ). At time N, choose an alternative. Maximize

12 State Transition At time n measure alternative x n. We update our estimate of based on the measurement Estimates of other Y x do not change.

13 At time n,  n+1 x is a normal random variable with mean  n x and variance satisfying uncertainty about Y x before the measurement uncertainty about Y x after the measurement change in best estimate of Y x due to the measurement n The value of the optimal policy satisfies Bellman’s equation

14 Utility of Information Consider our “utility of information”, and consider the random change in utility due to a measurement at time n

15 Knowledge Gradient Definition The knowledge gradient policy chooses the measurement that maximizes this expected increase in utility,

16 Knowledge Gradient We may compute the knowledge gradient policy via which is the expectation of the maximum of a normal and a constant.

17 Knowledge Gradient The computation becomes where  is the normal cdf,  is the normal pdf, and

18 Optimality Results If our measurement budget allows only one measurement (N=1), the knowledge gradient policy is optimal.

19 Optimality Results The knowledge gradient policy is optimal in the limit as the measurement budget N grows to infinity. This is really a convergence result.

20 Optimality Results The knowledge gradient policy has sub-optimality bounded by where V KG,n gives the value of the knowledge gradient policy and V n the value of the optimal policy.

21 Optimality Results If there are exactly 2 alternatives (M=2), the knowledge gradient policy is optimal. In this case, the optimal policy reduces to

22 Optimality Results If there is no measurement noise and alternatives may be reordered so that then the knowledge gradient policy is optimal.

23 Numerical Experiments 100 randomly generated problems M Uniform {1,...100} N Uniform {M, 3M, 10M}  0 x Uniform [-1,1] (  0 x ) 2 = 1 with probability 0.9 = 10 -3 with probability 0.1   = 1

24 Numerical Experiments

25 Compare alternatives via a linear combination of mean and standard deviation. The parameter z  /2 controls the tradeoff between exploration and exploitation. Interval Estimation

26 KG / IE Comparison

27 KG / IE Comparison Value of KG – Value of IE

28 IE and “Sticking” Alternative 1 is known perfectly

29 IE and “Sticking”

30 Thank You Any Questions?

31 Numerical Example 1

34 Knowledge Gradient Example

35 Interval Estimation Example

36 Boltzmann Exploration Parameterized by a declining sequence of temperatures (T 0,...T N-1 ).

Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering.

Similar presentations

Presentation on theme: "Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering.

Similar presentations

Presentation on theme: "Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering."— Presentation transcript:

Similar presentations

About project

Feedback