Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.

Slides:



Advertisements
Similar presentations
Constraint Based Reasoning over Mutex Relations in Graphplan Algorithm Pavel Surynek Charles University, Prague Czech Republic.
Advertisements

1 Constraint Satisfaction Problems A Quick Overview (based on AIMA book slides)
Markov Decision Process
MBD and CSP Meir Kalech Partially based on slides of Jia You and Brian Williams.
Partially Observable Markov Decision Process (POMDP)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
Department of Computer Science Undergraduate Events More
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
Decision Theoretic Planning
Optimal Policies for POMDP Presented by Alp Sardağ.
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
Search: Representation and General Search Procedure Jim Little UBC CS 322 – Search 1 September 10, 2014 Textbook § 3.0 –
A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento.
Planning under Uncertainty
CPSC 322, Lecture 16Slide 1 Stochastic Local Search Variants Computer Science cpsc322, Lecture 16 (Textbook Chpt 4.8) February, 9, 2009.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
POMDPs: Partially Observable Markov Decision Processes Advanced AI
Decision Theory: Single Stage Decisions Computer Science cpsc322, Lecture 33 (Textbook Chpt 9.2) March, 30, 2009.
Discovering Affine Equalities Using Random Interpretation Sumit Gulwani George Necula EECS Department University of California, Berkeley.
CPSC 322, Lecture 4Slide 1 Search: Intro Computer Science cpsc322, Lecture 4 (Textbook Chpt ) January, 12, 2009.
Search: Representation and General Search Procedure CPSC 322 – Search 1 January 12, 2011 Textbook § 3.0 –
CPSC 322, Lecture 11Slide 1 Constraint Satisfaction Problems (CSPs) Introduction Computer Science cpsc322, Lecture 11 (Textbook Chpt 4.0 – 4.2) January,
Markov Decision Processes CSE 473 May 28, 2004 AI textbook : Sections Russel and Norvig Decision-Theoretic Planning: Structural Assumptions.
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
CPSC 322, Lecture 12Slide 1 CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12 (Textbook Chpt ) January, 29, 2010.
Nov 14 th  Homework 4 due  Project 4 due 11/26.
Uninformed Search Reading: Chapter 3 by today, Chapter by Wednesday, 9/12 Homework #2 will be given out on Wednesday DID YOU TURN IN YOUR SURVEY?
A Constraint Satisfaction Problem (CSP) is a combinatorial decision problem defined by a set of variables, a set of domain values for these variables,
CPSC 322, Lecture 17Slide 1 Planning: Representation and Forward Search Computer Science cpsc322, Lecture 17 (Textbook Chpt 8.1 (Skip )- 8.2) February,
5/6: Summary and Decision Theoretic Planning  Last homework socket opened (two more problems to be added—Scheduling, MDPs)  Project 3 due today  Sapa.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
CPSC 322, Lecture 24Slide 1 Reasoning under Uncertainty: Intro to Probability Computer Science cpsc322, Lecture 24 (Textbook Chpt 6.1, 6.1.1) March, 15,
Knight’s Tour Distributed Problem Solving Knight’s Tour Yoav Kasorla Izhaq Shohat.
MAKING COMPLEX DEClSlONS
CPSC 502, Lecture 13Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 13 Oct, 25, 2011 Slide credit POMDP: C. Conati.
Slide 1 Constraint Satisfaction Problems (CSPs) Introduction Jim Little UBC CS 322 – CSP 1 September 27, 2014 Textbook §
Constraint Satisfaction Problems (CSPs) CPSC 322 – CSP 1 Poole & Mackworth textbook: Sections § Lecturer: Alan Mackworth September 28, 2012.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Hande ÇAKIN IES 503 TERM PROJECT CONSTRAINT SATISFACTION PROBLEMS.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Chapter 5 Constraint Satisfaction Problems
MDPs (cont) & Reinforcement Learning
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
Arc Consistency CPSC 322 – CSP 3 Textbook § 4.5 February 2, 2011.
Heuristic Search for problems with uncertainty CSE 574 April 22, 2003 Mausam.
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
1 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart CSC384: Lecture 16  Last time Searching a Graphplan for a plan, and relaxed plan heuristics.
Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.
Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;
Constraint Satisfaction Problems (CSPs) Introduction
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Planning: Representation and Forward Search
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Planning: Representation and Forward Search
Markov Decision Processes
Markov Decision Processes
Decision Theory: Single Stage Decisions
CS 416 Artificial Intelligence
Markov Decision Processes
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Markov Decision Processes
Planning: Representation and Forward Search
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

2 Contributions Conformant Probabilistic Planning –Uncertainty about initial state –Probabilistic Actions –No observations during plan execution –Find plan with highest probability of achieving the goal Utilize a CSP Approach –Encode the problem as a CSP –Develop new techniques for solving this kind of CSP Compare Decision-Theoretic algorithms

3 Conformant Probabilistic Planning CPP Problem consists of: S: (finite) set of states (factored representation) B: (initial) probability distribution over S (belief state) A: (finite) set of (prob.) actions (represented as sequential- effect trees) G: set of Goal states (Boolean expression over state vars) n: length/horizon of the plan to be computed Solution: find a plan (finite sequence of actions) that maximizes the probability of reaching a goal state from B

4 Example Problem: SandCastle S: {moat, castle}, both Boolean G: castle = true A: sequential effects tree representation (Littman 1997) Erect-Castle castle TF 1.0moat TF moat TF castle 0.0 TF 0.75castle:new TF R1R1 R2R2 R3R3 R4R4

5 Constraint Satisfaction Problems Encode the CPP as a CSP CSP –finite set of variables each with its own finite domain –finite set of constraints, each over some subset of the variables. Constraint: satisfied by only some variable assignments. Solution: an assignment to all Variables that satisfies all Constraints Standard Algorithm: Depth-First Tree Search with Constraint Propagation

6 The Encoding Variables: –State variables (usually Boolean) –Action variables {1, …, |A|} –Random variables to encode the action probabilities (True/False/Irrelevant). The setting of the random variables makes the actions deterministic Random variables are independent Constraints –Initial belief state represented as an “Initial action” (as in partial order planning). –Actions constraint the state variables at step i with those at step i+1 Each branch of the effects trees is represented as one constraint. –A constraint representing the goal

7 V1V1 V2V2 T F State Var

8 V1V1 V2V2 T F

9 V1V1 V2V2 a1 a2 a3 State Var Action Var T F

10 V1V1 V2V2 a1 a2 a3 State Var Action Var Random Var T F

11 V1V1 V2V2 A1A1 State Var Action Var Random Var T F TF

12 V1V1 V2V2 A1A1 State Var Action Var Random Var T F TF

13 V1V1 V2V2 A1A1 State Var Action Var Random Var T F

14 V1V1 V2V2 A1A1 State Var Action Var Random Var T F Goal States

15 V1V1 V2V2 A1A1 State Var Action Var Random Var T F Goal States a2

16 So far Encoded CPP as a CSP Solved the CSP How do we now solve the CPP?

17 State Var Action Var Random Var Goal States CSP Solutions

18 State Var Action Var Random Var Goal States CSP solution: assignment to State/Action/Random variables that is - valid sequence of transitions - from initial - to goal state

19 State Var Action Var Random Var Goal States a1a1 a1a1 Action variables: plan executed State variables: execution path induced by plan

20 State Var Action Var Random Var Goal States a1a1 a2a2

21 State Var Action Var Random Var Goal States a2a2 a1a1

22 State Var Action Var Random Var Goal States Probability of a Solution Product prob of Random vars : probability that this path was traversed by this plan

23 State Var Action Var Random Var Goal States a1 a2 Value of a Plan a1 a2 Value of plan  :  probs of all solutions with plan =  After all  ’s evaluated: optimal plan = one with highest value

24 Redundant Computations Due to the Markov property if the same state is encountered again at step i of the plan the subtree below will be the same –If we can cache all the info in this subtree: explore only once To compute the best overall n-step plan: need to know value of every n-1 step plan for all states at step 1.

25 Caching Probability of success (value) of a i step plan in state s = expectation of  ’s success probability over the states reached from s by a: If we know the value of  in each of these states: can compute its value in s without further search So, for each state s reached at step i, we cache the value of all n-i step plans s s1s1 s3s3 s2s2 V1V1 a V3V3 V2V2

26 CPplan Algorithm CPplan(): Select next unassigned variable V If V is last state var of a step: If this state/step is cached return Else if all vars are assigned (must be at a goal state): Cache 1 as value of previous state/step Else For each value d of V V=d CPplan() If V is some Action var A i Update Cache for previous state/step with values of plans starting with d

27 Caching Scheme Needs a lot of memory – proportional to |S|*|A| n – no known algorithm does better Other features –Cache key simple (state / step) –Partial Caching achieves a good space/time tradeoff

28 MAXPLAN (Majercik & Littman 1998) Parallel approach based on Stochastic SAT Caching Scheme different Faster than Buridan and other AI Planners uses (even) more memory than CPplan 2 to 3 orders of magnitude slower than CPplan

29 Results vs. Maxplan SandCastle-67Slippery Gripper Time Number of Steps

30 CPP as special case of POMDPs POMDP: model for probabilistic planning in partially observable environments CPP can be cast as a POMDP in which there are no observations. Value Iteration, a standard POMDP algorithm, can be used to compute a solution to CPP.

31 Value Iteration: Intuitions Value Iteration utilizes a powerful form of state abstraction. –Value of an i-step plan (for every belief state) is represented compactly by vector of values (one for each state): value on a belief state is the expectation of these values. –This vector of values is called an  -vector. –Value iteration need only consider the set of  -vectors that are optimal for some belief state. –Plans optimal for some belief state are optimal over an entire region of belief states. –So regions of belief states are managed collectively by a single plan (  -vector) that is i-step optimal for all belief states in the region.

32  -vector Abstraction Number of alpha-vectors that need to be considered might grow much more slowly than the number of action sequences. Slippery Gripper: –1 step to go: 2  -vectors instead of 4 actions –2 steps to go: 6 instead of 16 (action sequences) –3 steps to go: 10 instead of 64 –10 steps to go: 40 instead of >10 6

33 Results vs. POMDP Time Number of Steps Slippery Gripper Grid 10x10 Time Number of Steps

34 Dynamic Reachability POMDPs: small portion of all possible plans to evaluate but on all belief states including those not reachable from the initial belief state. Combinatorial Planners (CPplan, Maxplan) must evaluate all |A| n plans but tree search performs dynamic reachability and goal attainability analysis to only evaluate plans on reachable states at each step Ex: Grid 10x10, only 4 states reachable in 1 step

35 Conclusion -- Future Work New approach to CPP, better than previous AI planning techniques (Maxplan, Buridan) Analysis of respective benefits of decision theoretic techniques and AI techniques Ways to combine abstraction with dynamic reachability for POMDPs and MDPs.

36 Results vs. Maxplan SandCastle - 67 Slippery Gripper Time Number of Steps

37 Results vs. POMDP Time Number of Steps Slippery GripperGrid 10x10 Time Number of Steps