Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 OR II GSLM 52800. 2 3 Policy and Action  policy  the rules to specify what to do for all states  action  what to do at a state as dictated by the.

Similar presentations


Presentation on theme: "1 OR II GSLM 52800. 2 3 Policy and Action  policy  the rules to specify what to do for all states  action  what to do at a state as dictated by the."— Presentation transcript:

1 1 OR II GSLM 52800

2 2

3 3 Policy and Action  policy  the rules to specify what to do for all states  action  what to do at a state as dictated by the policy  examples  policy: replacement only at state 3  do nothing at states 0, 1, and 2, replacing at state 3  policy: overhaul at state 2 and replacement at state 3  do nothing at state 0 and 1, overhaul at state 2, and replace at state 3

4 4 Expected Reward  p ij (k) = the probability of changing from state i to state j when action k is taken  q ij (k) = expected cost at state i when action k is taken and the state changes to j  C ik = the expected cost at state i with action k i j p ij (k)

5 5 Definition of Variables  policy R  g(R) = the long-term average cost per unit time of policy R  objective: finding the policy that minimizes g .. ..  v i (R) = the effect on the total expected cost when adopting policy R and starting at state i

6 6 Relationship Between & Relationship Between & Claim: The intuitive idea is exact

7 7 Key Result in Policy Improvement  M+1 equations, M+2 unknowns  g(R) = the long-term average cost of policy R  v i (R) = the effect on the total expected cost when adopting policy R and starting at state i

8 8 Idea of Policy Improvement  the collection of v i (R) does not change by adding a constant  v i (R) = v i +c  the set of equations can be solved by arbitrarily setting v M (R) = 0

9 9 Idea of Policy Improvement  given policy R with action k, suppose that there exists policy R o with action k o such that  then it can be shown that g(R o ) < g(R)

10 10 Policy Improvement  1  Value Determination: Fix policy R. Set v M (R) to 0 and solve  2  Policy Improvement: For each state i, find action k as argument minimum of  3  Form a new policy from actions in 2 . Stop if this policy is the same as R; else go to 1 

11 11 Idea of Policy Improvement  it can be proven that  g is non-increasing  R is minimum if there is no change in policy  the algorithm stops after finite number of iterations

12 12 Example  Policy: Replacement only at state 3  transition probability matrix  C 11 = 0, C 21 = 1000, C 31 = 3000, C 33 = 6000

13 13 Example  Iteration 1:  Value Determination

14 14 Example  Iteration 1:  Policy Improvement  nothing can be done at state 0 and machine must be replaced at state 3  possible decisions at  state 1: decision 1 (do nothing, $1000) decision 3 (replace, $6000)  state 2: decision 1 (do nothing, $3000) decision 2 (overhaul, $4000) decision 3 (replace, $6000)

15 15 Example  Iteration 1:  Policy Improvement : the general expressions

16 16 Example  Iteration 1:  Policy Improvement Decision State 1 C1kC1kC1kC1k p 10 (k) p 11 (k) p 12 (k) p 13 (k) E(value) /41/81/ Decision State 2 C 2k p 20 (k) p 21 (k) p 22 (k) p 23 (k) E(value) /21/ new policy: do nothing at states 0 and 1, overhaul at state 2, and replace at state 3

17 17 Example  Iteration 2:  Value Determination It can be shown that there is no improvement in policy so that doing nothing at states 0 and 1, overhauling at state 2, and replacing at state 3 is an optimum policy


Download ppt "1 OR II GSLM 52800. 2 3 Policy and Action  policy  the rules to specify what to do for all states  action  what to do at a state as dictated by the."

Similar presentations


Ads by Google