1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

Markov Decision Process
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Partially Observable Markov Decision Process (POMDP)
Department of Computer Science Undergraduate Events More
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Partially Observable Markov Decision Processes
Dynamic Bayesian Networks (DBNs)
Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.
Decision Theoretic Planning
Optimal Policies for POMDP Presented by Alp Sardağ.
Meeting 3 POMDP (Partial Observability MDP) 資工四 阮鶴鳴 李運寰 Advisor: 李琳山教授.
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
Introduction of Probabilistic Reasoning and Bayesian Networks
1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
An Introduction to Markov Decision Processes Sarah Hickmott
Partially Observable Markov Decision Process By Nezih Ergin Özkucur.
主講人:虞台文 大同大學資工所 智慧型多媒體研究室
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
Planning under Uncertainty
1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California
Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2005.
Part 2 of 3: Bayesian Network and Dynamic Bayesian Network.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Markov Decision Processes
Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2003 material from Jean-Claude Latombe, and Daphne Koller.
Dynamic Bayesian Networks CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanningLearning.
MAKING COMPLEX DEClSlONS
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
Overview  Decision processes and Markov Decision Processes (MDP)  Rewards and Optimal Policies  Defining features of Markov Decision Process  Solving.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
MDPs (cont) & Reinforcement Learning
U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
Decision Making Under Uncertainty CMSC 471 – Spring 2014 Class #12– Thursday, March 6 R&N, Chapters , material from Lise Getoor, Jean-Claude.
Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Web-Mining Agents Agents and Rational Behavior Decision-Making under Uncertainty Complex Decisions Ralf Möller Universität zu Lübeck Institut für Informationssysteme.
Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Ralf Möller Universität zu Lübeck Institut für Informationssysteme
Making complex decisions
POMDPs Logistics Outline No class Wed
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Markov ó Kalman Filter Localization
Markov Decision Processes
Markov Decision Processes
Announcements Homework 3 due today (grace period through Friday)
CS 188: Artificial Intelligence Fall 2007
Chapter 17 – Making Complex Decisions
Hidden Markov Models (cont.) Markov Decision Processes
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
CS 416 Artificial Intelligence
Reinforcement Nisheeth 18th January 2019.
Markov Decision Processes
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Markov Decision Processes
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002

2 POMDP: UNCERTAINTY Uncertainty about the action outcome Uncertainty about the world state due to imperfect (partial) information --- Huang Hui

3 Outline POMDP agent Constructing a new MDP in which the current probability distribution over states plays the role of the state variable causes the state new state space characterized by real- valued probabilities and infinite. Decision-theoretic Agent Design for POMDP a limited lookahead using the technology of decision networks

4 Decision cycle of a POMDP agent 1. Given the current belief state b, execute the action 2. Receive observation o 3. Set the current belief state to SE(b,a,o) and repeat. SE Agent World Observation Action b

5 Belief state b(s) is the probability assigned to the actual state s by belief state b

6 Belief MDP A belief MDP is a tuple : B = infinite set of belief states A = finite set of actions  (b, a) = (reward function) P(b ’ |b, a) = (transition function) Where P(b ’ |b, a, o) = 1 if SE(b, a, o) = b ’,P(b ’ |b, a, o) = 0 otherwise; Move West once b b’b’

7 Solutions for POMDP Belief MDP has reduced POMDP to MDP, the MDP obtained has a continuous state space. Methods based on value and policy iteration: A policy can be represented as a set of regions of belief state space, each of which is associated with a particular optimal action. The value function associates a distinct linear function of b with each region. Each value or policy iteration step refines the boundaries of the regions and may introduce new regions. A Method based on lookahead search: decision-theoretic agents

8 Decision Theory = probability theory + utility theory The fundamental idea of decision theory is that an agent is rational if and only if it chooses the action that yields the highest expected utility, averaged over all possible outcomes of the action.

9 A decision-theoretic agent function DECISION-THEORETIC-AGENT(percept) returns action calculate updated probabilities for current state based on available evidence including current percept and previous action calculate outcome probabilities for actions given action descriptions and probabilities of current states select action with highest expected utility given probabilities of outcomes and utility information return action

10 Basic elements of decision-theoretic agent design Dynamic belief network--- the transition and observation models Dynamic decision network (DDN)--- decision and utility A filtering algorithm (e.g. Kalman filtering)---incorporate each new percept and action and update the belief state representation. Decisions are made by projecting forward possible action sequences and choosing the best action sequence.

11 Definition of Belief The belief about the state at time t is the probability distribution over the state given all available evidence: (1) is state variable, refers the current state of the world is evidence variable.

12 Calculation of Belief (1) Assumption 1: the problem is Markovian, (2) Assumption 2: each percept depends only on the state at the time (3) Assumption 3: the action taken depends only on the percepts the agent has received to date (4)

13 Calculation of Belief (2) Prediction phase: (5) ranges over all possible values of the state variables Estimation phase: (6) is a normalization constant

14 Design for a decision-theoretic Agent Function DECISION-THEORETIC-AGENT( ) returns an action inputs:, the percept at time t static: BN, a belief network with nodes X Bel(X), a vector of probabilities, updated over time return action

15 Sensing in uncertain worlds Sensor model:, describes how the environment generates the sensor data. vs Observation model O(s,o) Action model:, describes the effects of actions vs Transition model Stationary sensor model: where E and X are random variables ranging over percepts and states Advantage: can be used at each time step.

16 A sensor model in a belief network P(B).001 P(E).002 (a) Belief network fragment showing the general relationship between state variables and sensor variables. BurglaryEarthquake Alarm JohnCalls MaryCalls BEP(A) TTFFTTFF TFTFTFTF AP(J) T.90 F.05 AP(M) T.70 F.01 Sensor nodes Next step: break apart the generalized state and sensor variables into their components.

17 (b) An example with pressure and temperature gauges (c) Measuring temperature using two separate gauges

18 Sensor Failure In order for the system to handle sensor failure, the sensor model must include the possibility of failure. Lane Position Sensor Accuracy WeatherSensor FailureTerrain Position Sensor

19 Dynamic Belief Network Markov chain (state evolution model): a sequence of values where each one is determined solely by the previous one: Dynamic belief network (DBN): a belief network with one node for each state and sensor variable for each time step.

20 Generic structure of a dynamic belief network State.t-2 Percept.t-2 State.t-1 Percept.t-1 State.t Percept.t State.t+1 Percept.t+1 State.t+2 Percept.t+2 STATE EVOLUTION MODEL SENSOR MODEL Two tasks of the network: Calculate the probability distribution for state at time t Probabilistic projection: concern with how the state will evolve into the future

21 Prediction: Rollup: remove slice t-1 Estimation: State.t-1 Percept.t-1 State.t Percept.t State.t Percept.t State.t Percept.t State.t+1 Percept.t+1

22 Lane Position.t Position Sensor.t Sensor Accuracy.t Weather.t Terrain.t Sensor Failure.t Lane Position.t+1 Position Sensor.t+1 Sensor Accuracy.t+1 Weather.t+1 Terrain.t+1 Sensor Failure.t+1

23 Dynamic Decision Networks Dynamic Decision Networks: add utility nodes and decision nodes for actions into dynamic belief networks.

24 Sense.t D.t-1 State.t Sense.t+1 State.t+1 D.t Sense.t+2 State.t+2 D.t+1 Sense.t+3 State.t+3 D.t+2 U.t+3 The generic structure of a dynamic decision network The decision problem involves calculating the value of that maximizes the agent’s expected utility over the remaining state sequence.

25 Search tree of the lookahead DDN in

26 Some characters of DDN search tree The search tree of DDN is very similar to the EXPECTIMINIMAX algorithm for game trees with chance nodes, expect that: 1. There can also be rewards at non-leaf states 2. The decision nodes correspond to belief states rather than actual states. The time complexity: d is the depth, |D| is the number of available actions, |E| is the number of possible observations

27 Discussion of DDN The DDN promises potential solutions to many of the problems that arise as AI systems are moved from static, accessible, and above all simple environments to dynamic, inaccessible, complex environments that are closer to the real world. The DDN provides a general, concise representation for large POMDP, so they can be used as inputs for any POMDP algorithm including value and policy iteration methods.

28 Perspective of DDN to reduce complexity Combined with a heuristic estimate for utility of the remaining steps Many approximation techniques: 1. Using less detailed state variables for states in the distant future. 2. Using a greedy heuristic search through the space of decision sequences. 3. Assuming “most likely” values for future percept sequences rather than considering all possible values …