1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.

Slides:



Advertisements
Similar presentations
Dialogue Policy Optimisation
Advertisements

Markov Decision Process
Partially Observable Markov Decision Process (POMDP)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
Department of Computer Science Undergraduate Events More
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
1 Reinforcement Learning Problem Week #3. Figure reproduced from the figure on page 52 in reference [1] 2 Reinforcement Learning Loop state Agent Environment.
Optimal Policies for POMDP Presented by Alp Sardağ.
Meeting 3 POMDP (Partial Observability MDP) 資工四 阮鶴鳴 李運寰 Advisor: 李琳山教授.
CS594 Automated decision making University of Illinois, Chicago
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
An Introduction to Markov Decision Processes Sarah Hickmott
Partially Observable Markov Decision Process By Nezih Ergin Özkucur.
主講人:虞台文 大同大學資工所 智慧型多媒體研究室
Planning under Uncertainty
1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.
POMDPs: Partially Observable Markov Decision Processes Advanced AI
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California
An Introduction to PO-MDP Presented by Alp Sardağ.
Incremental Pruning: A simple, Fast, Exact Method for Partially Observable Markov Decision Processes Anthony Cassandra Computer Science Dept. Brown University.
4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)
Markov Decision Processes
Department of Computer Science Undergraduate Events More
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Reinforcement Learning (1)
Instructor: Vincent Conitzer
MAKING COMPLEX DEClSlONS
Reinforcement Learning
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
Department of Computer Science Undergraduate Events More
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Theory of Computations III CS-6800 |SPRING
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
CPS 570: Artificial Intelligence Markov decision processes, POMDPs
Department of Computer Science Undergraduate Events More
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
Markov Decision Process (MDP)
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
CS 416 Artificial Intelligence Lecture 20 Making Complex Decisions Chapter 17 Lecture 20 Making Complex Decisions Chapter 17.
Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.
On the Difficulty of Achieving Equilibrium in Interactive POMDPs Prashant Doshi Dept. of Computer Science University of Georgia Athens, GA Twenty.
1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Making complex decisions
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Markov Decision Processes
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Markov Decision Processes
Hidden Markov Models Part 2: Algorithms
13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Discrete-time markov chain (continuation)
Reinforcement Nisheeth 18th January 2019.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation Professor: Piotr. Kaidi Zhao Ph.D. Candidate of Computer Science Dept. UIC.

2 Agenda Brief review of the MDP Introduce the POMDP Information State MDP Information State Value functions. Construct Information State MDP Forward Triggered Backward Triggered Mixed Delayed Summary and Review

3 Brief review of the MDP Formally, MDP model is a 4-tuple (S, A, T, R) where: S is a finite set of world states A is a finite set of action T: S x A x S  [0, 1], define the transition probability distribution P(s|s’, a) that describes the effect of actions on the world state R: S x A x S  R defines a reward model that describes payoffs associated with a state transition under some action So what is MDP missing?

4 Brief review of the MDP Where MDP plays: Requires a perfectly observable states. Can be uncertain about possible outcomes of its action, but requires clear aware of the state that the agent is now in. ~~~~~However, real life is not that easy~~~~~ Where POMDP plays: Uncertainty about the action outcome Uncertainty about the world state due to imperfect (partial) information

5 Agenda Brief review of the MDP Introduce the POMDP Information State MDP Information State Value functions. Construct Information State MDP Forward Triggered Backward Triggered Mixed Delayed Summary and Review

6 Partially observable Markov decision process is defined as (S, A,, T, O, R) S corresponds to a finite set of world states A is a finite set of actions is a finite set of observations T: S x A x S  [0, 1] defines the transition probability distribution P(s|s’, a) that describes the effect of actions on the state of the world O: x S x A  [0, 1] defines the observation probability distribution P(o|s, a) that models the effect of actions and states on observations R corresponds to the reward models S x A x S  R that models payoffs incurred by state transitions under specific actions POMDP Reminder: MDP is (S, A, T, R)

7 Influence Diagrams Reminder: MDP is: POMDP:

8 Information State Since in POMDP the underlying process state is not known with certainty and can be only guessed based on past observations, actions and any prior information available, we need to differentiate between the “true process state” and the “information (perceived) state”.

9 Agenda Brief review of the MDP Introduce the POMDP Information State MDP Information State Value functions. Construct Information State MDP Forward Triggered Backward Triggered Mixed Delayed Summary and Review

10 Information State An information state represents all information available to the agent at the decision time that is relevant for the selection of the optimal action. The information state consists of either a complete history of actions and observations or corresponding sufficient statistic.

11 Information State MDP A sequence of information states defines a Markov controlled process in which every new information state is computed as a function of the previous information state, the previous step action and new observations seen: The process defined over information states is called information state MDP.

12 Information State MDP MDPReminder: POMDP POMDP with info. statesInformation State MDP

13 Info. State Representation 1/3 Complete Information State ( ): consists of all information available to the agent before the action at time t is made. It consists of: Prior belief on states at time 0 All observation available up to time t All actions performed before time t

14 Info. State Representation 1/3 Major hindrance: expanding dimension and size. Replace complete information states with quantities that represent sufficient statistics with regard to control. These quantities satisfy the Markov property and preserve the information content of the complete state that is relevant for finding the optimal control.

15 Info. State Representation 2/3 Sufficient Information State process: Let P={I 0, I 1, …, I t, …} be a sequence of information vectors describing the information process. The P is a sufficient information process with regard to the optimal control when for every component I t in P holds:

16 Info. State Representation 3/3 Belief States as Sufficient Info. States: The quality often used as a sufficient statistic in POMDPs is the belief state. The belief state assigns probability to every process state and reflects the extent to which states are believed to be present. The belief vector b t at time t corresponds to:

17 Value Functions Value functions for MDP can be directly applied to Information State MDP For the n steps-to-go value function for some fixed plan: (Reminder:MDP)

18 Value Functions Expected one step cost for an information state I n and an action a is: A next step information state I n-1 is: Rewrite the value function to :

19 Value Functions Optimal value function for finite n-steps-to-go problem is: The optimal control functions is: Optimal value function for infinite discounted horizon problem is: Optimal control function is:

20 Value Function Mappings Basic value function equations can be written also in the value function mapping form. Which enable us to represent the value function as:

21 Agenda Brief review of the MDP Introduce the POMDP Information State MDP Information State Value functions. Construct Information State MDP Forward Triggered Backward Triggered Mixed Delayed Summary and Review

22 Forward Triggered Observation POMDP with standard (forward triggered) observations, assume an observation depends solely on the current process state and the previous action. Q: Can info. state MDP be sufficiently represented using belief state?

23 Forward Triggered Observation Yes! The sufficient information state process by definition should satisfy the following:

24 Backward Triggered Observation POMDP with backward triggered observation: An action a t performed at time t causes an observation about the process state s t to be made - - the action performed at time t enables the observation that refers to the “before action” state. Major cause: time discretization. Which state is better approximated by a new observation? The state that occurred after or before the action.

25 Backward Triggered Observation The belief update for an action a t-1 and an observation that is related to the state at time t- 1 but observed (made available) at time t is:

26 Forward &Backward Combined Two previous models can be combined. The observation model consists of two groups of observations. One group is triggered in the forward and the other in the backward fashion. Assume the observations associated with the same state are independent given that state.

27 POMDP with Delayed Observation How it comes? An action issued by an agent at time t will be performed at time t+k. An observation made at time t will become available to the agent at time t+k. In the next example model: Observations are triggered backwards Observations with different time lags are assumed to be independent given the process state At every time t the agent can expect to receive results related to at most k past process states

28 POMDP with K-step Delayed Observation

29 POMDP with K-step Delayed Observation Reminder: It violates the third prerequisite of sufficient information state process. We need to do some convert. where

30 POMDP with K-step Delayed Observation Observation vector: let be a contribution to the belief state at time t-i that comes from observation related to that state and that were made up to time t: Prior belief vector: let be a contribution to the belief state at time t-i from all actions made prior to that time, related observations made up to time t, and prior belief at time t = 0:

31 POMDP with K-step Delayed Observation Belief state at time t: Which enable us to convert the POMDP model to the information state MDP.

32 Agenda Brief review of the MDP Introduce the POMDP Information State MDP Information State Value functions. Construct Information State MDP Forward Triggered Backward Triggered Mixed Delayed Summary and Review

33 Summary and Review Q1: what uncertainty different POMDP from MDP? I use several questions to summary my presentation.

34 Summary and Review A1: uncertainty about the world state due to imperfect (partial) information. Q 2: What is “information state”?

35 Summary and Review A 2: An information state represents all information available to the agent at the decision time that is relevant for the selection of the optimal action. Q 3: What are the value functions for the information state MDP?

36 Summary and Review A 3: We can apply value functions from MDP to information state MDP. Q 4: Which one is backward triggered observation?

37 Summary and Review A 4: The left one. Thanks for attending my presentation!