MIT and James Orlin © Stochastic Dynamic Programming –Review –DP with probabilities
MIT and James Orlin © Overview Objective: illustrate the use of DP with probabilities Seems more complex because it is a more complex decision at each stage But the optimal decision at each stage still depends on the previous stages.
MIT and James Orlin © Review of DP using stages Capital Budgeting, again Investment budget = $14,000
MIT and James Orlin © The Dynamic programming stages and states Let f(k,B) be the best NPV limited to stocks 1, 2, …, k only and using a budget of at most B. Stages: at stage k consider only stocks 1, 2, …, k State: B is the budget Compute f(1, B) for B = 0 to 14. Then compute f(2, B) for B = 0 to 14. Then compute f(3, B) for B = 0 to 14. etc.
MIT and James Orlin © Capital Budgeting: stage 1 Budget used up Consider stock 1: cost $5, NPV: $16 f(k, B) f(1,B) = 0 for B = 0 to 4 f(1, B) = 16 for B >= B00016 S100
MIT and James Orlin © Capital Budgeting: stage 2 Budget used up Consider stock 1: cost $5, NPV: $16 f(k, B) f(2,B) = 0 for B = 0 to 4 f(2, B) = 16 for B = 5, 6 f(2, B) = 22 for B = 7 to 11 f(2, B) = 38 for B = 12 to B00016 S100 Consider stock 2: cost $7, NPV: $ S200
MIT and James Orlin © Capital Budgeting: stage 3, using DP Budget used up B S200 Consider stock 3: cost $4, NPV: $12 f(2, B) We can compute f(3, B) using f(2, ) as input. We illustrate on f(3, 9). Don’t buy stock 3 $22 Buy stock 3 $12 $16 $28 Choose the best decision.
MIT and James Orlin © On the DP for the Capital Budgeting Problem Buy stock 3 Don’t buy stock 3 $22 $12 $16 $28 f(3,9) = max [ 12 + f(2, 5), f(2,9) ] f(3, B) = f(2, B) for B = 0, 1, 2, 3 f(3, B) = max [12 + f(2, B-4), f(2, B) ] for B = 4 to 14. In general, f(k, B) can be computed from f(k-1, · )
MIT and James Orlin © Decision Diagrams Buy stock 3 Don’t buy stock 3 $22 $12 $16 $28 The above diagram is a decision diagram. The optimal decision at each stage can be determined from decisions at previous stages. We may view the diagram as a “local decision diagram” since it involves only a small part of the overall decision. We use an extension of this approach when we deal with dynamic programming under uncertainty.
MIT and James Orlin © Dynamic Programming under uncertainty Next: we will permit uncertainties in our DPs. This is usually where DP gets much more powerful as a tool, but also more complex We illustrate with an example in warfare, or gaming if you prefer.
MIT and James Orlin © Destroying an enemy target: a bomber example You are a pilot in enemy territory. Your mission is to destroy an important target. You must get through. You have four minutes to reach your target, and have just been spotted by radar. Enemies have can launch up to one bomber per minute to prevent you from reaching the target. The probability of them launching a bomber in any minute is q i for i = 1 to 4.
MIT and James Orlin © A bomber example, continued To protect yourself, you have M missiles. Each has a probability of p j of destroying the bomber. Whenever you see a bomber, you must decide how many missiles to launch. If you do not destroy the bomber, then you will be destroyed. Determine a strategy for how many missiles to launch at each time, assuming you see a bomber attacking you. –Let f(k, m) be the number of missiles to launch assuming that you have k minutes left and have m missiles on hand. –A strategy is to determine f(k, m) for k = 1 to 4 and m = 1 to M.
MIT and James Orlin © Simulating the bomber example Each person has a die and a page describing the probabilities. Simulate 1 or more instances of the game. –We will discuss the results –Then we will show how to determine an optimal strategy using DP
MIT and James Orlin © What is the probability of surviving with 1 minutes remaining and 4 missiles left bomber launched? 1 minutes left, 4 missiles Fire yes hit? You win! yes no You win! no You lose. There is one minute left. You have 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3. If a bomber is launched, how many missiles do you fire. What is the probability of survival? 1 missile 2 missiles 3 missiles 4 missiles Step 1. Draw the diagram. Firing all missiles is clearly optimal with one minute to go.
MIT and James Orlin © Step 2. Fill in probabilities and end-values The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3. What is the probability of survival? bomber launched? 1 minutes left, 4 missiles Fire yes hit? You win! yes no You win! no You lose. 1 missile 2 missiles 3 missiles 4 missiles Fill in end values, prob. of survival 1 0 Fill in probabilities of events. 1/3 2/3 Probability of 4 missiles missing is (2/3) 4 = 16/81 16/81 65/81 1
MIT and James Orlin © Step 3. Compute values at each node. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3. bomber launched? 1 minutes left, 4 missiles F yes H You win! yes no You win! no You lose. 1 missile 2 missiles 3 missiles 4 missiles 1 0 Compute values at each node, moving from right to left. 1/3 2/3 Value(B)= 1/3 1 + 2/3 65/81 = 211/243 16/81 65/81 211/ /243 =.868 B Value(F)= Value(H) = 65/81 Value(H)= 65/81 /81 0
MIT and James Orlin © Carry out similar calculations for other values at stage 1, that is one minute remaining Probability of surviving Number of missiles remaining Calculations for stage 1. We next do a stage 2 calculation, which will be typical of all other calculations.
MIT and James Orlin © Diagram for Determining Number of Missiles to Fire Fire hit? Lose Lose Lose Lose bomber launched? yes no yes no yes no yes no yes no 1 missile 2 missiles 3 missiles 4 missiles There are two minutes left. You have 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3. If a bomber is launched, how many missiles do you fire? 2 minutes left, 4 missiles Step 1, lay out the diagram.
MIT and James Orlin © Step 2. Fill in end values Fire hit? Lose Lose Lose Lose bomber launched? yes no yes no yes no yes no yes no 1 missile 2 missiles 3 missiles 4 missiles 2 minutes left. 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3. 2 minutes left, 4 missiles Fill in end values
MIT and James Orlin © /3 Step 3. Fill in probabilities for events Fire hit? Lose Lose Lose Lose bomber launched? yes no yes no yes no yes no yes no 1 missile 2 missiles 3 missiles 4 missiles 2 minutes left. 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3. 1/3 2 minutes left, 4 missiles Fill in Probabilities /3 4/9 8/27 16/81 5/9 19/27 65/81 2/3
MIT and James Orlin © /3 Step 4. Determine values of nodes and make decisions. F H1 H2 H3 H4 Lose Lose Lose Lose bomber launched? yes no yes no yes no yes no yes no 1 missile 2 missiles 3 missiles 4 missiles 2 minutes left. 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3. 1/3 2 minutes left, 4 missiles Determine node values /3 4/9 8/27 16/81 5/9 19/27 65/81 2/3 Value(H1) = 1/3 /3 0 = Value(H2) = 5/9 /9 0 = Value(H3) = 19/27 /27 0 =.3909 Value(H4) = 65/81 /81 0 =.2673 Value(F) = max[Value(H1), Value(H2), Value(H3), Value(H4)] = B Value(B) = 1/3 /3 .3909 =.550
MIT and James Orlin © Node values: again H1 H2 H3 H4 Lose Lose Lose Lose yes no yes no yes no yes no 1 missile 2 missiles 3 missiles 4 missiles /3 4/9 8/27 16/81 5/9 19/27 65/81 2/3 Value = 1/3 /3 0 = Value = 5/9 /9 0 = Value = 19/27 /27 0 =.3909 Value = 65/81 /81 0 =.2673
MIT and James Orlin © Some comments on DP Seems complex, but the computations are all very similar. –easy to program (not so easy in Excel) –very efficient Useful in finance –investments over time –the outcome of an investment is uncertain Useful in inventory control –demands are uncertain –supplies must be ordered in advance
MIT and James Orlin © Probabilities of surviving Probability of reaching the target missiles minute minutes minutes minutes Bomber spreadsheet
MIT and James Orlin © Summary for dynamic programming Useful in decision making over time Uses stages, states, optimal value functions Uses recursion Can incorporate probabilities Useful in inventory management, finance, shortest path, and much more