Sungwook Yoon – Probabilistic Planning via Determinization Probabilistic Planning via Determinization in Hindsight FF-Hindsight Sungwook Yoon Joint work.

Sungwook Yoon – Probabilistic Planning via Determinization Probabilistic Planning via Determinization in Hindsight FF-Hindsight Sungwook Yoon Joint work with Alan Fern, Bob Givan and Rao Kambhampati

Sungwook Yoon – Probabilistic Planning via Determinization Probabilistic Planning Competition Client : Participants, send action Server: Competition Host, simulates actions 2

Sungwook Yoon – Probabilistic Planning via Determinization The Winner was …… FF-Replan – A replanner. Use FF – Probabilistic domain is determinized Interesting Contrast – Many probabilistic planning techniques Work in theory but does not work in practice – FF-Replan No theory Work in practice 3

Sungwook Yoon – Probabilistic Planning via Determinization The Paper’s Objective Better determinization approach (Determinization in Hindsight) Theoretical consideration of the new determinization (in Hindsight) New view on FF-Replan Experimental studies with determinization in Hindsight (FF-Hindsight) 4

Sungwook Yoon – Probabilistic Planning via Determinization Probabilistic Planning (goal-oriented) Action Probabilistic Outcome Time 1 Time 2 Goal State 5 Action State Maximize Goal Achievement Dead End A1A2 I A1 A2 A1 A2 A1 A2 A1 A2 Left Outcomes are more likely

Sungwook Yoon – Probabilistic Planning via Determinization All Outcome Replanning (FFR A ) Action Effect 1 Effect 2 Probability 1 Probability 2 Action1Effect 1 Action2Effect 2 ICAPS-07 6

Sungwook Yoon – Probabilistic Planning via Determinization Probabilistic Planning All Outcome Determinization Action Probabilistic Outcome Time 1 Time 2 Goal State 7 Action State Find Goal Dead End A1A2 A1 A2 A1 A2 A1 A2 A1 A2 I A1-1A1-2A2-1A2-2 A1-1A1-2A2-1A2-2A1-1A1-2A2-1A2-2A1-1A1-2A2-1A2-2A1-1A1-2A2-1A2-2

Sungwook Yoon – Probabilistic Planning via Determinization Probabilistic Planning All Outcome Determinization Action Probabilistic Outcome Time 1 Time 2 Goal State 8 Action State Find Goal Dead End A1A2 A1 A2 A1 A2 A1 A2 A1 A2 I A1-1A1-2A2-1A2-2 A1-1A1-2A2-1A2-2A1-1A1-2A2-1A2-2A1-1A1-2A2-1A2-2A1-1A1-2A2-1A2-2

Sungwook Yoon – Probabilistic Planning via Determinization Problem of FF-Replan and better alternative sampling 9 FF-Replan’s Static Determinizations don’t respect probabilities. We need “Probabilistic and Dynamic Determinization” Sample Future Outcomes and Determinization in Hindsight Each Future Sample Becomes a Known-Future Deterministic Problem

Sungwook Yoon – Probabilistic Planning via Determinization Probabilistic Planning (goal-oriented) Action Probabilistic Outcome Time 1 Time 2 Goal State 10 Action State Maximize Goal Achievement Dead End Left Outcomes are more likely A1A2 A1 A2 A1 A2 A1 A2 A1 A2 I

Sungwook Yoon – Probabilistic Planning via Determinization 11 Start Sampling Note. Sampling will reveal which is better A1? Or A2 at state I

Sungwook Yoon – Probabilistic Planning via Determinization Hindsight Sample 1 Action Probabilistic Outcome Time 1 Time 2 Goal State 12 Action State Maximize Goal Achievement Dead End A1: 1 A2: 0 Left Outcomes are more likely A1A2 A1 A2 A1 A2 A1 A2 A1 A2 I

Sungwook Yoon – Probabilistic Planning via Determinization Hindsight Sample 2 Action Probabilistic Outcome Time 1 Time 2 Goal State 13 Action State Maximize Goal Achievement Dead End Left Outcomes are more likely A1: 2 A2: 1 A1A2 A1 A2 A1 A2 A1 A2 A1 A2 I

Sungwook Yoon – Probabilistic Planning via Determinization Summary of the Idea: The Decision Process (Estimating Q-Value, Q(s,a)) 1. For Each Action A, Draw Future Samples 2. Solve The Deterministic Problems 3. Aggregate the solutions for each action 4. Select the action with best aggregation S: Current State, A(S) → S’ Each Sample is a Deterministic Planning Problem The solution length is used for goal-oriented problems, Q(s,A) Max A Q(s,A) 16

Sungwook Yoon – Probabilistic Planning via Determinization Mathematical Summary of the Algorithm H-horizon future F H for M = [S,A,T,R] – Mapping of state, action and time (h<H) to a state – S × A × h → S Value of a policy π for F H – R(s,F H, π) V HS (s,H) = E F H [max π R(s,F H,π)] Compare this and the real value V*(s,H) = max π E F H [ R(s,F H,π) ] V FFRa (s) = max F V(s,F) ≥ V HS (s,H) ≥ V*(s,H) Q(s,a,H) = (R(a) + E F H-1 [max π R(a(s),F H-1,π)] ) – In our proposal, computation of max π R(s,F H-1,π) is approximately done by FF [Hoffmann and Nebel ’01] 17 Done by FF Each Future is a Deterministic Problem

Sungwook Yoon – Probabilistic Planning via Determinization Key Technical Results The Importance of Independent Sampling of States, Actions, Time The necessity of Random Time Breaking in Decision making Theorem 1 When there is a policy that can achieve the goal with probability 1 within horizon, hindsight decision making algorithm will find the goal with probability 1. Theorem 2 Polynomial number of samples are needed with regard to, Horizon, Action, The minimum Q-value advantage We identify the characteristic of FF-Replan in terms of Hindsight Decision Making, V FFRa (s) = max F V(s,F) 18

Sungwook Yoon – Probabilistic Planning via Determinization Empirical Results ProblemFFRaFF-Hindsight Blocksworld270158 Boxworld150100 Fileworld2914 R-Tireworld30 ZenoTravel300 Exploding BW528 G-Tireworld718 Tower of Hanois1117 IPPC-04 ProblemsNumbers are solved Trials For ZenoTravel, when we used Importance sampling, the solved trials have been improved to 26 19

Sungwook Yoon – Probabilistic Planning via Determinization Empirical Results PlannersClimberRiverBus-FareTire1Tire2Tire3Tire4Tire5Tire6 FFRa 60%65%1%50%0% Paragraph 100%65%100% 3%1%0% FPG 100%65%22%100%92%60%35%19%13% FF-HS 100%65%100% These Domains are Developed just to Beat FF-Replan Obviously, FF-Replan did not do well. But, FF-Hindsight did very well, showing Probabilistic Reasoning Ability while achieving Scalability 20

Sungwook Yoon – Probabilistic Planning via Determinization Conclusion 21 Deterministic Planning scalability Classic Planning Machine Learning for Planning Net Benefit Optimization Temporal Planning Probabilistic Planning scalability Markov Decision Processes Machine Learning for MDP Temporal MDP scalability Determinization

Sungwook Yoon – Probabilistic Planning via Determinization Conclusion Devised an algorithm that can take advantage of the significant advances in deterministic planning in the context of probabilistic planning Made many of the deterministic planning techniques available to probabilistic planning – Most of the learning to planning techniques are developed solely for deterministic planning Now, these techniques are relevant to probabilistic planning too – Advanced net-benefit style of planners can be used for the reward maximization style of probabilistic planning problems 22

Sungwook Yoon – Probabilistic Planning via Determinization Discussion Mercier and Van Hentenryck provided the analysis of the difference between – V*(s,H) = max π E F H [ R(s,F H,π) ] – V HS (s,H) = E F H [max π R(s,F H,π)] Ng and Jordan provided the analysis of the difference between – V*(s,H) = max π E F H [ R(s,F H,π) ] – V^(s,H) = max π ∑ [ R(s,F H,π) ] / m, where m is the sample number 23

Sungwook Yoon – Probabilistic Planning via Determinization IPPC-2004 Results NMR C J1ClassyNMRmGPTCFFR S FFR A BW2522702553012030210270 Box1341501000300150 File---33031429 Zeno---30 0 Tire-r---30 Tire-g---9163077 TOH---1500011 Exploding ---00035 Human Control Knowledge 2 nd Place Winners Learned Knowledge NMR Non-Markovian Reward Decision Process Planner ClassyApproximate Policy Iteration with a Policy Language Bias mGPTHeuristic Search Probabilistic Planning CSymbolic Heuristic Search Numbers : Successful Runs Winner of IPPC-04 FFRs 24

Sungwook Yoon – Probabilistic Planning via Determinization IPPC-2006 Results FFR A FPGFOALPsfDPParagraphFFR S BW866310029077 Zenotravel 100270777 Random1006500573 Elevator93761000093 Exploding52432431 52 Drive71560090 Schedule51540010 PitchCatch54230000 Tire82758209169 FPGFactored Policy Gradient Planner FOALPFirst Order Approximate Linear Programming sfDP Symbolic Stochastic Focused Dynamic Programming with Decision Diagrams ParagraphA Graphplan Based Probabilistic Planner Numbers : Percentage of Successful Runs Unofficial Winner of IPPC-06 FFRa 25

Sungwook Yoon – Probabilistic Planning via Determinization 26

Sungwook Yoon – Probabilistic Planning via Determinization Sampling Problem Time dependency issue Start S1S2 Goal S3 Dead End A B C (with probability p) C (with probability 1-p) D (with probability 1-p) D (with probability p) 27

Sungwook Yoon – Probabilistic Planning via Determinization Sampling Problem Time dependency issue Start S1S2 Goal S3 Dead End A B S3 is worse state then S1 but looks like there is always a path to Goal Need to sample independently across actions 28

Sungwook Yoon – Probabilistic Planning via Determinization Action Selection Problem Random Tie breaking is essential StartS1Goal C: with probability 1-p C: with probability p B: with probability p A: Always stays in Start B: with probability 1-p In Start state, C action is definitely better, but A can be used to wait until C to the Goal effect is realized 29

Sungwook Yoon – Probabilistic Planning via Determinization Sampling Problem Importance Sampling (IS) StartGoalS1 B: with extremely low probability B: with very high probability - Sampling uniformly would find the problem unsolvable. - Use importance sampling. - Identifying the region that needs importance sampling is for further study. -In the benchmark, Zenotravel needs the IS idea. 30

Sungwook Yoon – Probabilistic Planning via Determinization Theoretical Results Theorem 1 – For goal-achieving probabilistic planning problems, if there is a policy that can solve the probabilistic planning problem with probability 1 with bounded horizon, then hindsight planning would solve the problem with probability 1. If there is no such policy, hindsight planning would return less 1 success ratio. – If there is a future where no plan can achieve the goal, the future can be sampled Theorem 2 – The number of future samples needed to correctly identify the best action – w > 4Δ -2 T ln (|A|H| / δ) – Δ : the minimum Q-advantage of the best action over the other actions, δ: confidence parameter – From Chernoff Bound 31

Sungwook Yoon – Probabilistic Planning via Determinization Probabilistic Planning Expecti-max solution Action Probabilistic Outcome Time 1 Time 2 Goal State 32 Action State Maximize Goal Achievement Max Exp EEEEEEEE

Sungwook Yoon – Probabilistic Planning via Determinization Hindsight Sample 1 Action Probabilistic Outcome Time 1 Time 2 Goal State 33 Action State Maximize Goal Achievement Dead End A1: 1 A2: 0 Left Outcomes are more likely A1A2 A1 A2 A1 A2 A1 A2 A1 A2 I

Sungwook Yoon – Probabilistic Planning via Determinization Probabilistic Planning via Determinization in Hindsight FF-Hindsight Sungwook Yoon Joint work.

Similar presentations

Presentation on theme: "Sungwook Yoon – Probabilistic Planning via Determinization Probabilistic Planning via Determinization in Hindsight FF-Hindsight Sungwook Yoon Joint work."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sungwook Yoon – Probabilistic Planning via Determinization Probabilistic Planning via Determinization in Hindsight FF-Hindsight Sungwook Yoon Joint work.

Similar presentations

Presentation on theme: "Sungwook Yoon – Probabilistic Planning via Determinization Probabilistic Planning via Determinization in Hindsight FF-Hindsight Sungwook Yoon Joint work."— Presentation transcript:

Similar presentations

About project

Feedback