Download presentation

Presentation is loading. Please wait.

Published byNicolette Toney Modified over 3 years ago

1
1 OR II GSLM 52800

2
2

3
3 Policy and Action policy the rules to specify what to do for all states action what to do at a state as dictated by the policy examples policy: replacement only at state 3 do nothing at states 0, 1, and 2, replacing at state 3 policy: overhaul at state 2 and replacement at state 3 do nothing at state 0 and 1, overhaul at state 2, and replace at state 3

4
4 Expected Reward p ij (k) = the probability of changing from state i to state j when action k is taken q ij (k) = expected cost at state i when action k is taken and the state changes to j C ik = the expected cost at state i with action k i j p ij (k)

5
5 Definition of Variables policy R g(R) = the long-term average cost per unit time of policy R objective: finding the policy that minimizes g .. .. v i (R) = the effect on the total expected cost when adopting policy R and starting at state i

6
6 Relationship Between & Relationship Between & Claim: The intuitive idea is exact

7
7 Key Result in Policy Improvement M+1 equations, M+2 unknowns g(R) = the long-term average cost of policy R v i (R) = the effect on the total expected cost when adopting policy R and starting at state i

8
8 Idea of Policy Improvement the collection of v i (R) does not change by adding a constant v i (R) = v i +c the set of equations can be solved by arbitrarily setting v M (R) = 0

9
9 Idea of Policy Improvement given policy R with action k, suppose that there exists policy R o with action k o such that then it can be shown that g(R o ) < g(R)

10
10 Policy Improvement 1 Value Determination: Fix policy R. Set v M (R) to 0 and solve 2 Policy Improvement: For each state i, find action k as argument minimum of 3 Form a new policy from actions in 2 . Stop if this policy is the same as R; else go to 1

11
11 Idea of Policy Improvement it can be proven that g is non-increasing R is minimum if there is no change in policy the algorithm stops after finite number of iterations

12
12 Example Policy: Replacement only at state 3 transition probability matrix C 11 = 0, C 21 = 1000, C 31 = 3000, C 33 = 6000

13
13 Example Iteration 1: Value Determination

14
14 Example Iteration 1: Policy Improvement nothing can be done at state 0 and machine must be replaced at state 3 possible decisions at state 1: decision 1 (do nothing, $1000) decision 3 (replace, $6000) state 2: decision 1 (do nothing, $3000) decision 2 (overhaul, $4000) decision 3 (replace, $6000)

15
15 Example Iteration 1: Policy Improvement : the general expressions

16
16 Example Iteration 1: Policy Improvement Decision State 1 C1kC1kC1kC1k p 10 (k) p 11 (k) p 12 (k) p 13 (k) E(value) 1100003/41/81/81923 3600010004538 Decision State 2 C 2k p 20 (k) p 21 (k) p 22 (k) p 23 (k) E(value) 13000001/21/21923 240000100-769 360001000-231 new policy: do nothing at states 0 and 1, overhaul at state 2, and replace at state 3

17
17 Example Iteration 2: Value Determination It can be shown that there is no improvement in policy so that doing nothing at states 0 and 1, overhauling at state 2, and replacing at state 3 is an optimum policy

Similar presentations

OK

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google