Download presentation

Presentation is loading. Please wait.

Published byNicolette Toney Modified about 1 year ago

1
1 OR II GSLM 52800

2
2

3
3 Policy and Action policy the rules to specify what to do for all states action what to do at a state as dictated by the policy examples policy: replacement only at state 3 do nothing at states 0, 1, and 2, replacing at state 3 policy: overhaul at state 2 and replacement at state 3 do nothing at state 0 and 1, overhaul at state 2, and replace at state 3

4
4 Expected Reward p ij (k) = the probability of changing from state i to state j when action k is taken q ij (k) = expected cost at state i when action k is taken and the state changes to j C ik = the expected cost at state i with action k i j p ij (k)

5
5 Definition of Variables policy R g(R) = the long-term average cost per unit time of policy R objective: finding the policy that minimizes g .. .. v i (R) = the effect on the total expected cost when adopting policy R and starting at state i

6
6 Relationship Between & Relationship Between & Claim: The intuitive idea is exact

7
7 Key Result in Policy Improvement M+1 equations, M+2 unknowns g(R) = the long-term average cost of policy R v i (R) = the effect on the total expected cost when adopting policy R and starting at state i

8
8 Idea of Policy Improvement the collection of v i (R) does not change by adding a constant v i (R) = v i +c the set of equations can be solved by arbitrarily setting v M (R) = 0

9
9 Idea of Policy Improvement given policy R with action k, suppose that there exists policy R o with action k o such that then it can be shown that g(R o ) < g(R)

10
10 Policy Improvement 1 Value Determination: Fix policy R. Set v M (R) to 0 and solve 2 Policy Improvement: For each state i, find action k as argument minimum of 3 Form a new policy from actions in 2 . Stop if this policy is the same as R; else go to 1

11
11 Idea of Policy Improvement it can be proven that g is non-increasing R is minimum if there is no change in policy the algorithm stops after finite number of iterations

12
12 Example Policy: Replacement only at state 3 transition probability matrix C 11 = 0, C 21 = 1000, C 31 = 3000, C 33 = 6000

13
13 Example Iteration 1: Value Determination

14
14 Example Iteration 1: Policy Improvement nothing can be done at state 0 and machine must be replaced at state 3 possible decisions at state 1: decision 1 (do nothing, $1000) decision 3 (replace, $6000) state 2: decision 1 (do nothing, $3000) decision 2 (overhaul, $4000) decision 3 (replace, $6000)

15
15 Example Iteration 1: Policy Improvement : the general expressions

16
16 Example Iteration 1: Policy Improvement Decision State 1 C1kC1kC1kC1k p 10 (k) p 11 (k) p 12 (k) p 13 (k) E(value) 1100003/41/81/81923 3600010004538 Decision State 2 C 2k p 20 (k) p 21 (k) p 22 (k) p 23 (k) E(value) 13000001/21/21923 240000100-769 360001000-231 new policy: do nothing at states 0 and 1, overhaul at state 2, and replace at state 3

17
17 Example Iteration 2: Value Determination It can be shown that there is no improvement in policy so that doing nothing at states 0 and 1, overhauling at state 2, and replacing at state 3 is an optimum policy

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google