Download presentation

Presentation is loading. Please wait.

Published byJaheim Alvarez Modified over 2 years ago

1

2
A SIMPLE INTRODUCTION TO DYNAMIC PROGRAMMING

3
PROBLEM SET-UP Problem is arrayed as a set of decisions made over time. System has a discrete state Each decision results in some reward or cost, and results in the system being moved to another state. Usually has a finite number of transitions. Transitions can be probabilistic, as can the rewards. Solution is a decision strategy that maximizes summed reward (minimizes cost)

4
Notation N = finite planning horizon S n (x) = cost of optimally operating from n to N given state x at time n. d n *(x) is the optimal policy at stage n given state x at time n. x(d n ) is the state resulting from deciding d at stage n. c(d n ) is the cost of taking decision d n

5
EXAMPLE You have moved to Singapore, and you need to operate a car for 3 yrs. You plan to sell the car when you leave Your QOL is not affected by your wheels Cost/resale of cars and operating costs are below 0123 sale price op cost

6
MAPPING TO THE NOTATION State: Age of you car Stage: Years you have been in S-pore Policy: Car’s age you buy at the END of the year

7
COST EXAMPLE you have a 2yr old car you operate for the year ($600) you sell your 3 yr old car (-$150) you buy a new (to you) 1 yr old used car ($800) TOTAL: $1250

8
finish 0123 start

9
car age"cost" end of yr

10
CONTINUED COST EXAMPLE It’s beginning yr 2, and you possess a 2 yr old car You can.... operate the car (600 + S 3 (3yr old car)) operate the car, sell it, buy new car ( S 3 (new)) operate the car, sell it, buy 1yr old car ( S 3 (1 yr old car))...

11
123 "cost" end of yr

12
123 "cost" end of yr

13
BELLMAN’S EQUATION Sometimes its easy to get your name on something!

14
EXEMPLAR A specialized tool is available during the period 9am,..., 3pm Each hour, a bid for the asset is made according to the table below The asset is busy for 3 hr. if the bid is accepted

15
end

16
end

17
end

18
end

19
end

20
end

21
end

22
end Note 1: Once the diagram is drawn, the problem can be solved by a shortest(longest) path algorithm Note 2: Dynamic Programming = Shortest Path

23
PROBABILISTIC TRANSITIONS 1.c(d) is a random variable 2.x(d) is random 3.the “trial” takes place after the decision

24
EXEMPLAR (Probabilistic) An “asset” is available during the period 8pm, 9pm,..., 3am Each hour, a bid for the asset is made according to the discrete probability density below The asset is busy for 3 hr. if the bid is accepted

25
MANY APPROACHES TO FORMULATION N = 4am S n (x) = profit of optimally operating from n to N given state x at time n. d n *(x) is the optimal policy at stage n given state x at time n (ACCEPT, REJECT) c(d n ) is the profit of taking decision d n x(d n ) is the proposed bid (3,6,9) or the number of hours left in the remaining engagement (1hr, 2hr)

26
RECURSION time hours before asset is available again See DP Example.xls

27
UNLOCKING THE JARGON x(d) can be governed by a Markov Chain a different P i,j matrix for each decision d Result is a Markov Decision Process

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google