# 1 OR II GSLM 52800. 2 3 Policy and Action  policy  the rules to specify what to do for all states  action  what to do at a state as dictated by the.

## Presentation on theme: "1 OR II GSLM 52800. 2 3 Policy and Action  policy  the rules to specify what to do for all states  action  what to do at a state as dictated by the."— Presentation transcript:

1 OR II GSLM 52800

2

3 Policy and Action  policy  the rules to specify what to do for all states  action  what to do at a state as dictated by the policy  examples  policy: replacement only at state 3  do nothing at states 0, 1, and 2, replacing at state 3  policy: overhaul at state 2 and replacement at state 3  do nothing at state 0 and 1, overhaul at state 2, and replace at state 3

4 Expected Reward  p ij (k) = the probability of changing from state i to state j when action k is taken  q ij (k) = expected cost at state i when action k is taken and the state changes to j  C ik = the expected cost at state i with action k i j p ij (k)

5 Definition of Variables  policy R  g(R) = the long-term average cost per unit time of policy R  objective: finding the policy that minimizes g .. ..  v i (R) = the effect on the total expected cost when adopting policy R and starting at state i

6 Relationship Between & Relationship Between & Claim: The intuitive idea is exact

7 Key Result in Policy Improvement  M+1 equations, M+2 unknowns  g(R) = the long-term average cost of policy R  v i (R) = the effect on the total expected cost when adopting policy R and starting at state i

8 Idea of Policy Improvement  the collection of v i (R) does not change by adding a constant  v i (R) = v i +c  the set of equations can be solved by arbitrarily setting v M (R) = 0

9 Idea of Policy Improvement  given policy R with action k, suppose that there exists policy R o with action k o such that  then it can be shown that g(R o ) < g(R)

10 Policy Improvement  1  Value Determination: Fix policy R. Set v M (R) to 0 and solve  2  Policy Improvement: For each state i, find action k as argument minimum of  3  Form a new policy from actions in 2 . Stop if this policy is the same as R; else go to 1 

11 Idea of Policy Improvement  it can be proven that  g is non-increasing  R is minimum if there is no change in policy  the algorithm stops after finite number of iterations

12 Example  Policy: Replacement only at state 3  transition probability matrix  C 11 = 0, C 21 = 1000, C 31 = 3000, C 33 = 6000

13 Example  Iteration 1:  Value Determination

14 Example  Iteration 1:  Policy Improvement  nothing can be done at state 0 and machine must be replaced at state 3  possible decisions at  state 1: decision 1 (do nothing, \$1000) decision 3 (replace, \$6000)  state 2: decision 1 (do nothing, \$3000) decision 2 (overhaul, \$4000) decision 3 (replace, \$6000)

15 Example  Iteration 1:  Policy Improvement : the general expressions

16 Example  Iteration 1:  Policy Improvement Decision State 1 C1kC1kC1kC1k p 10 (k) p 11 (k) p 12 (k) p 13 (k) E(value) 1100003/41/81/81923 3600010004538 Decision State 2 C 2k p 20 (k) p 21 (k) p 22 (k) p 23 (k) E(value) 13000001/21/21923 240000100-769 360001000-231 new policy: do nothing at states 0 and 1, overhaul at state 2, and replace at state 3

17 Example  Iteration 2:  Value Determination It can be shown that there is no improvement in policy so that doing nothing at states 0 and 1, overhauling at state 2, and replacing at state 3 is an optimum policy

Similar presentations