# A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each.

## Presentation on theme: "A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each."— Presentation transcript:

A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING

PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each decision results in some reward or cost, and results in the system being moved to another state. Usually has a finite number of transitions. Transitions can be probabilistic, as can the rewards. Solution is a decision strategy that maximizes summed reward (minimizes cost)

Notation N = finite planning horizon S n (x) = cost of optimally operating from n to N given state x at time n. d n *(x) is the optimal policy at stage n given state x at time n. x(d n ) is the state resulting from deciding d at stage n. c(d n ) is the cost of taking decision d n

EXAMPLE You have moved to Singapore, and you need to operate a car for 3 yrs. You plan to sell the car when you leave Your QOL is not affected by your wheels Cost/resale of cars and operating costs are below 0123 sale price1000800450150 op cost200400600

MAPPING TO THE NOTATION State: Age of you car Stage: Years you have been in S-pore Policy: Car’s age you buy at the END of the year

COST EXAMPLE you have a 2yr old car you operate for the year (\$600) you sell your 3 yr old car (-\$150) you buy a new (to you) 1 yr old used car (\$800) TOTAL: \$1250

finish 0123 start0400200 1950750400 214501250900600

car age"cost" end of yr 3 0-1000 1-800 2-450 3-150

CONTINUED COST EXAMPLE It’s beginning yr 2, and you possess a 2 yr old car You can.... operate the car (600 + S 3 (3yr old car)) operate the car, sell it, buy new car (600 -150 + 1000 + S 3 (new)) operate the car, sell it, buy 1yr old car (600 -150 + 800 + S 3 (1 yr old car))...

123 "cost" end of yr 3 01200-200-600-1000 11550350-50-800 21700850450-450 3 -150 1450 1250 900

123 "cost" end of yr 3 01200-200-600-1000 11550350-50-800 21700850450-450 3 -150

BELLMAN’S EQUATION Sometimes its easy to get your name on something!

EXEMPLAR A specialized tool is available during the period 9am,..., 3pm Each hour, a bid for the asset is made according to the table below The asset is busy for 3 hr. if the bid is accepted 9101112123 100150160501754010

0 000 100 150 160 9 1112 1 10 2 end 00 50175 10 40 0

0 000 100 150 160 91112 1 10 2 end 00 50175 10 40 0 10

0 000 100 150 160 91112 1 10 2 end 00 50175 10 40 0 10

0 000 100 150 160 91112 1 10 2 end 00 50175 10 40 0 10 175

0 000 100 150 160 91112 1 10 2 end 00 50175 10 40 0 10 175

0 000 100 150 160 91112 1 10 2 end 00 50175 10 40 0 10 175

0 000 100 150 160 91112 1 10 2 end 00 50175 10 40 0 10 175 325

0 000 100 150 160 91112 1 10 2 end 00 50175 10 40 0 10 175 325 Note 1: Once the diagram is drawn, the problem can be solved by a shortest(longest) path algorithm Note 2: Dynamic Programming = Shortest Path

PROBABILISTIC TRANSITIONS 1.c(d) is a random variable 2.x(d) is random 3.the “trial” takes place after the decision

EXEMPLAR (Probabilistic) An “asset” is available during the period 8pm, 9pm,..., 3am Each hour, a bid for the asset is made according to the discrete probability density below The asset is busy for 3 hr. if the bid is accepted

MANY APPROACHES TO FORMULATION N = 4am S n (x) = profit of optimally operating from n to N given state x at time n. d n *(x) is the optimal policy at stage n given state x at time n (ACCEPT, REJECT) c(d n ) is the profit of taking decision d n x(d n ) is the proposed bid (3,6,9) or the number of hours left in the remaining engagement (1hr, 2hr)

RECURSION time hours before asset is available again See DP Example.xls

UNLOCKING THE JARGON x(d) can be governed by a Markov Chain a different P i,j matrix for each decision d Result is a Markov Decision Process

Download ppt "A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each."

Similar presentations