Policy Generation for Continuous-time Stochastic Domains with Concurrency Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.

Slides:

Advertisements

Similar presentations

Chapter 7 Managing Risk.

Advertisements

Distributed Markov Chains P S Thiagarajan School of Computing, National University of Singapore Joint work with Madhavan Mukund, Sumit K Jha and Ratul.

Probabilistic Planning (goal-oriented) Action Probabilistic Outcome Time 1 Time 2 Goal State 1 Action State Maximize Goal Achievement Dead End A1A2 I A1.

Partially Observable Markov Decision Process (POMDP)

Probabilistic Verification of Discrete Event Systems using Acceptance Sampling Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.

Framework for comparing power system reliability criteria Evelyn Heylen Prof. Geert Deconinck Prof. Dirk Van Hertem Durham Risk and Reliability modelling.

PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :

Statistical Probabilistic Model Checking Håkan L. S. Younes Carnegie Mellon University.

Ymer: A Statistical Model Checker Håkan L. S. Younes Carnegie Mellon University.

Engineering Economic Analysis Canadian Edition

Probabilistic Verification of Discrete Event Systems Håkan L. S. Younes Reid G. Simmons (initial work performed at HTC, Summer 2001)

A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento.

Decision Theory: Single Stage Decisions Computer Science cpsc322, Lecture 33 (Textbook Chpt 9.2) March, 30, 2009.

1 Statistical Inference H Plan: –Discuss statistical methods in simulations –Define concepts and terminology –Traditional approaches: u Hypothesis testing.

Concurrent Markov Decision Processes Mausam, Daniel S. Weld University of Washington Seattle.

Temporal Specification Chris Patel Vinay Viswanathan.

1 Software Testing and Quality Assurance Lecture 36 – Software Quality Assurance.

Markov Decision Processes CSE 473 May 28, 2004 AI textbook : Sections Russel and Norvig Decision-Theoretic Planning: Structural Assumptions.

Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle.

Probabilistic Verification of Discrete Event Systems Håkan L. S. Younes.

The Rare Glitch Project: Verification Tools for Embedded Systems Carnegie Mellon University Pittsburgh, PA Ed Clarke, David Garlan, Bruce Krogh, Reid Simmons,

Probabilistic Temporal Planning with Uncertain Durations Mausam Joint work with Daniel S. Weld University of Washington Seattle.

VESTA: A Statistical Model- checker and Analyzer for Probabilistic Systems Authors: Koushik Sen Mahesh Viswanathan Gul Agha University of Illinois at Urbana-Champaign.

Dynamic Bayesian Networks CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanningLearning.

CPSC 322, Lecture 24Slide 1 Reasoning under Uncertainty: Intro to Probability Computer Science cpsc322, Lecture 24 (Textbook Chpt 6.1, 6.1.1) March, 15,

Lecture 11 – Stochastic Processes

A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell.

Planning Concurrent Actions under Resources and Time Uncertainty Éric Beaudry Étudiant au doctorat en informatique.

Planning with Concurrency in Continuous-time Stochastic Domains (thesis proposal) Håkan L. S. Younes Thesis Committee: Reid G. Simmons (chair) Geoffrey.

1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested ( to me before class)  Can use your own.

Planning and Verification for Stochastic Processes with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.

Chapter © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or.

Planning and Execution with Phase Transitions Håkan L. S. Younes Carnegie Mellon University Follow-up paper to Younes & Simmons’ “Solving Generalized Semi-Markov.

Using Probability and Discrete Probability Distributions

Uncertainty in Future Events Chapter 10: Newnan, Eschenbach, and Lavelle Dr. Hurley’s AGB 555 Course.

Engineering Economic Analysis Canadian Edition

Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.

Outline of Chapter 9: Using Simulation to Solve Decision Problems Real world decisions are often too complex to be analyzed effectively using influence.

Haley: A Hierarchical Framework for Logical Composition of Web Services Haibo Zhao, Prashant Doshi LSDIS Lab, Dept. of Computer Science, University of.

By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering.

Carnegie Mellon University

Extending PDDL to Model Stochastic Decision Processes Håkan L. S. Younes Carnegie Mellon University.

Decision-Theoretic Planning with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.

© 2015 McGraw-Hill Education. All rights reserved. Chapter 19 Markov Decision Processes.

Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.

WOOD 492 MODELLING FOR DECISION SUPPORT Lecture 25 Simulation.

Probabilistic Verification of Discrete Event Systems using Acceptance Sampling Håkan L. S. Younes Carnegie Mellon University.

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

Probabilistic Verification of Discrete Event Systems Håkan L. S. Younes.

A Formalism for Stochastic Decision Processes with Asynchronous Events Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.

Why use landscape models?  Models allow us to generate and test hypotheses on systems Collect data, construct model based on assumptions, observe behavior.

CS433 Modeling and Simulation Lecture 11 Continuous Markov Chains Dr. Anis Koubâa 01 May 2009 Al-Imam Mohammad Ibn Saud University.

8/14/04J. Bard and J. W. Barnes Operations Research Models and Methods Copyright All rights reserved Lecture 11 – Stochastic Processes Topics Definitions.

Håkan L. S. YounesDavid J. Musliner Carnegie Mellon UniversityHoneywell Laboratories Probabilistic Plan Verification through Acceptance Sampling.

Planning: Representation and Forward Search

CPSC 531: System Modeling and Simulation

Decision Theory: Single Stage Decisions

Prepared by Lee Revere and John Large

Planning: Representation and Forward Search

Markov Decision Processes

Statistical Methods Carey Williamson Department of Computer Science

Markov Decision Processes

Decision Theory: Single Stage Decisions

Probability and Time: Markov Models

Statistical Model-Checking of “Black-Box” Probabilistic Systems VESTA

1.2 The Modeling Process.

Carey Williamson Department of Computer Science University of Calgary

Probability and Time: Markov Models

Statistical Probabilistic Model Checking

Lecture 11 – Stochastic Processes

Presentation transcript:

Policy Generation for Continuous-time Stochastic Domains with Concurrency Håkan L. S. YounesReid G. Simmons Carnegie Mellon University

2 Introduction Policy generation for asynchronous stochastic systems Rich goal formalism policy generation and repair Solve relaxed problem using deterministic temporal planner Decision tree learning to generalize plan Sample path analysis to guide repair

3 Motivating Example Deliver package from CMU to Honeywell CMU PIT Pittsburgh MSP Honeywell Minneapolis

4 Elements of Uncertainty Uncertain duration of flight and taxi ride Plane can get full without reservation Taxi might not be at airport when arriving in Minneapolis Package can get lost at airports Asynchronous events  not semi-Markov

5 Asynchronous Events While the taxi is on its way to the airport, the plan may become full driving taxi plane not full driving taxi plane full fill t 0 Arrival time distribution: F(t|t > t 0 )F(t)F(t)

6 Rich Goal Formalism Goals specified as CSL formulae  ::= true | a |    |  | P ≥  (  U ≤T  ) Goal example: “Probability is at least 0.9 that the package reaches Honeywell within 300 minutes without getting lost on the way” P ≥0.9 (  lost package U ≤300 at pkg,honeywell )

7 Problem Specification Given: Complex domain model Stochastic discrete event system Initial state Probabilistic temporally extended goal CSL formula Wanted: Policy satisfying goal formula in initial state

8 Generate, Test and Debug [Simmons, AAAI-88] Generate initial policy Test if policy is good Debug and repair policy good badrepeat

9 Generate Ways of generating initial policy Generate policy for relaxed problem Use existing policy for similar problem Start with null policy Start with random policy Generate Test Debug

10 Test [Younes et al., ICAPS-03] Use discrete event simulation to generate sample execution paths Use acceptance sampling to verify probabilistic CSL goal conditions Generate Test Debug

11 Debug Analyze sample paths generated in test step to find reasons for failure Change policy to reflect outcome of failure analysis Generate Test Debug

12 Closer Look at Generate Step Generate initial policy Test if policy is good Debug and repair policy Gener

13 Policy Generation Probabilistic planning problem Policy (decision tree) Eliminate uncertainty Solve using temporal planner (e.g. VHPOP [Younes & Simmons, JAIR 20] ) Generate training data by simulating plan Decision tree learning Deterministic planning problem Temporal plan State-action pairs

14 Conversion to Deterministic Planning Problem Assume we can control nature: Exogenous events are treated as actions Actions with probabilistic effects are split into multiple deterministic actions Trigger time distributions are turned into interval duration constraints Objective: Find some execution trace satisfying path formula  1 U ≤T  2 of probabilistic goal P ≥  (  1 U ≤T  2 )

15 Generating Training Data enter-taxi me,pgh-taxi,cmu depart-plane plane,pgh-airport,mpls-airport depart-taxi me,pgh-taxi,cmu,pgh-airport arrive-taxi pgh-taxi,cmu,pgh-airport leave-taxi me,pgh-taxi,pgh-airport check-in me,plane,pgh-airport arrive-plane plane,pgh-airport,mpls-airport s0s0 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 s 0 : enter-taxi me,pgh-taxi,cmu s 1 : depart-taxi me,pgh-taxi,cmu,pgh-airport s 3 : leave-taxi me,pgh-taxi,pgh-airport s 4 : check-in me,plane,pgh-airport s 2 : idle s 5 : idle …

16 Policy Tree at pgh-taxi,cmu at me,cmu at mpls-taxi,mpls-airport at plane,mpls-airport at me,pgh-airport in me,plane moving pgh-taxi,cmu,pgh-airport moving mpls-taxi,mpls-airport,honeywell at me,mpls-airport enter-taxidepart-taxi leave-taxi check-in enter-taxidepart-taxileave-taxi idle

17 Closer Look at Debug Step Generate initial policy Test if policy is good Debug and repair policy Debug

18 Policy Debugging Sample execution paths Revised policy Sample path analysis Solve deterministic planning problem taking failure scenario into account Generate training data by simulating plan Incremental decision tree learning [Utgoff et al., MLJ 29] Failure scenarios Temporal plan State-action pairs

19 Sample Path Analysis 1. Construct Markov chain from paths 2. Assign values to states Failure: –1 ; Success: +1 All other: 3. Assign values to events V(s′) – V(s) for transition s→s′ caused by e 4. Generate failure scenarios

20 Sample Path Analysis: Example s0s0 s1s1 s2s2 e1e1 e2e2 s0s0 s1s1 s4s4 e1e1 e4e4 s0s0 s3s3 e3e3 s2s2 e2e2 Sample paths: s0s0 s1s1 s3s3 s4s4 s2s2 1/3 2/31/2 1  = 0.9 Markov chain: V(s 0 ) = –0.213 V(s 1 ) = –0.855 V(s 2 ) = –1 V(s 3 ) = +1 V(s 4 ) = –0.9 State values: V(e 1 ) = 2·(V(s 1 ) – V(s 0 )) = –1.284 V(e 2 ) = (V(s 2 ) – V(s 1 )) + (V(s 2 ) – V(s 4 )) = –0.245 V(e 3 ) = V(s 3 ) – V(s 0 ) = V(e 4 ) = V(s 4 ) – V(s 1 ) = –0.045 Event values:

21 Failure Scenarios s0s0 s1s1 s2s2 e1e1 e2e2 s0s0 s1s1 s4s4 e1e1 e4e4 s2s2 e2e2 Failure paths: Failure path 1Failure path 2Failure scenario e 1.2e 1.6e 1.4 e 4.4e 4.5e 4.6 -e 4.8-

22 Additional Training Data leave-taxi me,pgh-taxi,cmu depart-plane plane,pgh-airport,mpls-airport depart-taxi me,pgh-taxi,cmu,pgh-airport arrive-taxi pgh-taxi,cmu,pgh-airport enter-taxi me,pgh-taxi,cmu s0s0 s1s1 s2s2 s6s6 s 0 : leave-taxi me,pgh-taxi,cmu s 1 : make-reservation me,plane,cmu s 5 : idle s 2 : enter-taxi me,pgh-taxi,cmu s 3 : depart-taxi me,pgh-taxi,cmu,pgh-airport … fill-plane plane,pgh-airport make-reservation me,plane,cmu s3s3 s5s5 s 4 : idle s4s4

23 Revised Policy Tree at pgh-taxi,cmu at me,cmu … enter-taxidepart-taxi has-reservation me,plane make-reservationleave-taxi

24 Summary Planning with stochastic asynchronous events using a deterministic planner Decision tree learning to generalize deterministic plan Sample path analysis for generating failure scenarios to guide plan repair

25 Coming Attractions Decision theoretic planning with asynchronous events: “A formalism for stochastic decision processes with asynchronous events”, MDP Workshop at AAAI-04 “Solving GSMDPs using continuous phase- type distributions”, AAAI-04