Presentation is loading. Please wait.

Presentation is loading. Please wait.

DPA51 Dynamic Programming Applications Lecture 5.

Similar presentations


Presentation on theme: "DPA51 Dynamic Programming Applications Lecture 5."— Presentation transcript:

1 DPA51 Dynamic Programming Applications Lecture 5

2 DPA52 Preview Last time: Structural properties. Today: Optimal stopping & the OLA rule (Secretary problem, Asset selling) Next time: Infinite horizon.

3 DPA53 The RM problem J t (x,i)= max{J t-1 (x), R i + J t-1 (x-1)}= (R i - OC t-1 (x)) + +J t-1 (x) Optimal policy: accept cls. i iff R i  OC t-1 (x) = J t-1 (x) - J t-1 (x-1) Results: 1. J t (x) increasing in x - by induction 2. OC t (x) decreasing in x - single crossing 3. OC t (x) increasing in t - by induction + 2: J t (x) =  p i (R i - OC t-1 (x)) + + J t-1 (x) J t (x-1)=  p i (R i - OC t-1 (x-1)) + + J t-1 (x-1) OC t (x)- OC t-1 (x)=  p i [(R i - OC t-1 (x)) + - (R i - OC t-1 (x-1)) + ]  0

4 DPA54 The RM problem - results The optimal policy is characterized by threshold levels b i t as follows: Accept class i at time t iff 0  x < b i t where b i t = min{x | OC t-1 (x) > R i } Moreover, b 1 t  …  b m t, where R 1  …  R m

5 DPA55 Optimal Stopping At each stage a control is available that stops the evolution of the system. At stage k there are 2 options: 1.Stop process (get a certain reward) 2.Continue process, perhaps at a certain cost, and select one of the next available choices. If there is only one other choice besides stopping, policy is characterized by the stopping states-set.

6 DPA56 Secretary Problems Cayley 1875 Interview N candidates for a job Must accept/reject at end of interview Objectives: –Maximize expected ‘score’ –Maximize P(get the best) (you risk to hire nobody!)

7 DPA57 Archetype problem Make irrevocable choice from a fixed number of opportunities whose values are revealed sequentially. Asset selling Purchasing with a deadline Exercising stock options (in your next HW)

8 DPA58 Max P(get best) W t =history of relative ranks of candidates seen by time t (inclusive) x t = 1, if t th candidate is best seen so far 0, otherwise Relevant: t and x t Fact: x t =1 and W t-1 statistically independent:

9 DPA59 Objective J t = P(under optimal policy we select best candidate given that we’ve rejected t-1 so far ) J t (0)=P(under optimal policy we select best candidate given that we’ve seen t so far and the last one was NOT the best so far) J t (1)= … P(best of N| best of first t) = ?

10 DPA510 DP equation J N+1 = 0 J t = (t-1)/t J t (0) + 1/t J t (1) J t (0) = J t+1 (must continue) J t (1) = max ( t/N, J t+1 ) (accept or continue) Fact 1: J t -1  J t Fact 2: J t  t and t/N  t => single crossing Define: t * = min {t | J t+1  t/N}

11 DPA511 Recursion J t = J t*, if t < t * (t-1)/t J t + 1/N, if t  t * J t /(t-1) = J t+1 /t + 1/(N(t-1)) Therefore:J t+1 = t/N  1/s (after telescoping) By definition, t* is the smallest s.t. J t * +1  t * /N, so t * = min{t |  1/s  1} = ? N-1 s=t N-1 s=t

12 DPA512 Policy For large N:  1/s  log e (N/ t 0 ) Therefore t 0  N/e Policy: Interview  N/e candidates and reject them, then select best you see so far. P(success) = J( t 0 )  t 0 /N  1/e .3679 Empirical validation? N-1 s=t 0

13 DPA513 The Last Shall be First “..The last person interviewed for a job gets it 55.8% of the time according to Runzheimer Canada, Inc. Early applicants are hired only 17.6% of the time; the management consulting firm suggests that job- seekers who find they are among the first to be grilled ‘tactfully ask to be rescheduled for a later date’. Mondays are also poor days to be interviewed and any day just before quitting time is also bad.” (The Globe and Mail, Sept. 12, 1990, pg. A22)

14 DPA514 Asset selling Like maximizing interview score, but with discounting/investment Offers: w 0,w 1,…,w N-1 i.i.d with fixed known distribution (if not known: inference, learning) Stage k choices: 1.Accept, and invest $w k at rate r 2.Reject, and wait until stage k+1 Objective: maximize revenue at end of period N

15 DPA515 Formulation State: x k  T: asset has not been sold, current offer is x k x k =T: asset has been sold Decision: u k = u sell; u k = u’ don’t sell Plant equation: x k+1 = T, if x k =T, or if x k  T and u k = u (sell) w k, otherwise

16 DPA516 Costs g N (x N ) = x N, if x N  T 0, else g k (x k ) = (1+r) N-k x k, if x k  T and u k =u 0, else J N (x N ) = x N, if x N  T 0, else J k (x k ) = max((1+r) N-k x k, E w {J k+1 (w k )}), if x k  T 0, else

17 DPA517 Policy Accept offer x k if x k > a k Reject offer x k if x k < a k Indifferent if x k = a k Optimal policy is determined by sequence a k : a k = E w {J k+1 (w k )} / (1+r) N-k

18 DPA518 Structural properties Fact: a k  a k+1 for all k Intuition: if an offer is good enough to be acceptable at time k, it should be so at time k+1.

19 DPA519 General stopping & OLA Stopping mandatory at or before stage N Stationary: state, control, disturbances, and their space sets, and cost/stage are constant over time Xtra action: go to termination state @ cost t(x k ) DP-algorithm: J N (x N ) = t(x N ) J k (x k ) = min(t(x k ), E w {g(x k,u k,w k )+J k+1 (f( x k,u k,w k )})

20 DPA520 Stopping set It is optimal to stop at time k for states x in the set: T k ={x| t(x)  min u E{g(x,u,w) + J k+1 (f(x,u,w)) } Fact:J N-1 (x)  J N (x), so J k-1 (x)  J k (x) for all k, x. Cor.: T 0  …  T k  T k+1  …  T N-1 Question: how to guarantee equality?

21 DPA521 Absorbance Condition: T N-1 is absorbing if x  T N-1 and termination not selected, then next state is in T N-1. That is f(x,u,w)  T N-1 for all x  T N-1, u  U(x), w. Intuition: if you reach a state that’s optimal to stop at, but you don’t stop, then you move to a state that’s also optimal to stop at. Theorem: If T N-1 is absorbing then T k =T N-1 for all k. OLA policy: iff T N-1 (1-step stopping set) absorbing.


Download ppt "DPA51 Dynamic Programming Applications Lecture 5."

Similar presentations


Ads by Google