Presentation on theme: "Deterministic Dynamic Programming. Dynamic programming is a widely-used mathematical technique for solving problems that can be divided into stages."— Presentation transcript:
Deterministic Dynamic Programming
Dynamic programming is a widely-used mathematical technique for solving problems that can be divided into stages and where decisions are required in each stage. The goal of dynamic programming is to find a combination of decisions that optimizes a certain amount associated with a system.
Dynamic Programming (DP) determines the optimum solution to an n-variable problem by decomposing it into n stages with each stage constituting a single-variable sub problem. Recursive Nature of Computations in DP Computations in DP are done recursively, in the sense that the optimum solution of one sub problem is used as an input to the next sub problem.
By the time the last sub problem is solved, the optimum solution for the entire problem is at hand. The manner in which the recursive computations are carried out depends on how we decompose the original problem In particular, the sub problems are normally linked by common constraints. As we move from one sub problem to the next, the feasibility of these common constraints must be maintained
We illustrate with the famous STAGECOACH problem It concerns a mythical fortune seeker in Missouri who decided to go west to join the gold rush in California during the mid-19th century. The journey would require travelling by stagecoach through different states.
Traveling out west was dangerous during this time frame, so the stagecoach company offered life insurance to their passengers Since our fortune seeker was concerned about his safety, he decided the safest route should be the one with the cheapest total life insurance cost
Four stages were required to travel from the point of embarkation in state A (Missouri) to his destination in state J (California). The insurance costs between the states are also shown. Thus the problem is to find the cheapest route the fortune-seeker should take
By using the minimum technique for selecting the shortest step offered by each successive step, we will have the possible shortest path A B F I J, with cost 13. When replacing A B F with A D F, we get another path with cost only 11. One possible approach is to enumerate all the possible routes, which is 18 routes. This is so-called exhaust enumeration method.
Now let’s do the same problem through dynamic programming: Stage State Decision variable Optimal policy (Optimal solution)
Dynamic programming does not exist a standard mathematical formulation of “the” dynamic programming problem. Rather, dynamic programming is a general type of approach to problem solving, and the particular equations used must be developed to fit each situation.
Dynamic programming starts with small portion of the original problem and finds the optimal solution for this smaller problem. It then gradually enlarges the problem, finding the current optimal solution from the preceding one, until the original problem is solved in its entirety.
Let decision variable x n, (n=1,2,3,4) be the immediate destination on stage n. The route selected is A x 1 x 2 x 3 x 4, where x 4 is J. Let f n (s, x n ) be the total cost of the best overall policy for the remaining stages, given that you are in state s, ready to start stage n, and select x n as the immediate destination. Given s and n, let x * n denotes any value of x n (not necessary unique) that minimizes f n (s, x n ), and let f * n (s) be the corresponding minimum value of f n (s, x n ).
Thus where f n (s, x n ) = immediate cost (at stage n) + minimum future cost (stages n+1 onward) = C s,x n +f * n+1 ( x n ) the value of C s,x n is given by the preceding tables for by i=s (the current state) and j= x n (the immediate destination), here f * 5 ( J ) =0. Objective is to find f * 1 (A) and the corresponding route.
Stage n=4: s H3J I4J
HI E484H F977I G676H x3x3 s
EFG B E or F C79107E D88118 E or F x2x2 s
BCD A1311 C or D s x1x1
The problem structure is divided into stages Each stage has a number of states associated with it Making decisions at one stage transforms one state of the current stage into a state in the next stage. Given the current state, the optimal decision for each of the remaining states does not depend on the previous states or decisions. This is known as the principle of optimality for dynamic programming. The principle of optimality allows to solve the problem stage by stage recursively.
The problem is divided into smaller subproblems each of them represented by a stage. The stages are defined in many different ways depending on the context of the problem. If the problem is about long-time development of a system then the stages naturally correspond to time periods. If the goal of the problem is to move some objects from one location to another on a map then partitioning the map into several geographical regions might be the natural division into stages. Generally, if an accomplishment of a certain task can be considered as a multi-step process then each stage can be defined as a step in the process.
Each stage has a number of states associated with it. Depending what decisions are made in one stage, the system might end up in different states in the next stage. If a geographical region corresponds to a stage then the states associated with it could be some particular locations (cities, warehouses, etc.) in that region. In other situations a state might correspond to amounts of certain resources which are essential for optimizing the system.
Making decisions at one stage transforms one state of the current stage into a state in the next stage. In a geographical example, it could be a decision to go from one city to another. In resource allocation problems, it might be a decision to create or spend a certain amount of a resource. For example, in the shortest path problem three different decisions are possible to make at the state corresponding to Columbus; these decisions correspond to the three arrows going from Columbus to the three states (cities) of the next stage: Kansas City, Omaha, and Dallas.
The goal of the solution procedure is to find an optimal policy for the overall problem, i.e., an optimal policy decision at each stage for each of the possible states. Given the current state, the optimal decision for each of the remaining states does not depend on the previous states or decisions. This is known as the principle of optimality for dynamic programming. For example, in the geographical setting the principle works as follows: the optimal route from a current city to the final destination does not depend on the way we got to the city. A system can be formulated as a dynamic programming problem only if the principle of optimality holds for it.