What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.

Slides:

Advertisements

Similar presentations

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.

Advertisements

Markov Decision Process

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Partially Observable Markov Decision Process (POMDP)

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Partially Observable Markov Decision Processes

Dynamic Bayesian Networks (DBNs)

5/11/2015 Mahdi Naser-Moghadasi Texas Tech University.

Meeting 3 POMDP (Partial Observability MDP) 資工四阮鶴鳴李運寰 Advisor: 李琳山教授.

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.

Partially Observable Markov Decision Process By Nezih Ergin Özkucur.

主講人：虞台文大同大學資工所智慧型多媒體研究室

Bayes Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read the.

Planning under Uncertainty

1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.

POMDPs: Partially Observable Markov Decision Processes Advanced AI

KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.

Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.

CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California

An Introduction to PO-MDP Presented by Alp Sardağ.

Incremental Pruning: A simple, Fast, Exact Method for Partially Observable Markov Decision Processes Anthony Cassandra Computer Science Dept. Brown University.

Incremental Pruning CSE 574 May 9, 2003 Stanley Kok.

4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.

Markov Decision Processes

Department of Computer Science Undergraduate Events More

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

Instructor: Vincent Conitzer

Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 15: Partially Observable Markov Decision Processes (POMDPs) Dr. Itamar Arel College.

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

CPSC 502, Lecture 13Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 13 Oct, 25, 2011 Slide credit POMDP: C. Conati.

Overview  Decision processes and Markov Decision Processes (MDP)  Rewards and Optimal Policies  Defining features of Markov Decision Process  Solving.

CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)

1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.

CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.

TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.

Solving POMDPs through Macro Decomposition

A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.

George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Machine Learning: Probabilistic Luger: Artificial.

Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.

CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

CS 416 Artificial Intelligence Lecture 20 Making Complex Decisions Chapter 17 Lecture 20 Making Complex Decisions Chapter 17.

Web-Mining Agents Agents and Rational Behavior Decision-Making under Uncertainty Complex Decisions Ralf Möller Universität zu Lübeck Institut für Informationssysteme.

Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.

1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.

Brief Intro to Machine Learning CS539

POMDPs Logistics Outline No class Wed

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Biomedical Data & Markov Decision Process

Markov ó Kalman Filter Localization

Course Logistics CS533: Intelligent Agents and Decision Making

Hierarchical POMDP Solutions

Dr. Unnikrishnan P.C. Professor, EEE

Reinforcement Learning

ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 15: Partially Observable Markov Decision Processes (POMDPs) November 5, 2015 Dr.

Chapter 17 – Making Complex Decisions

CS 416 Artificial Intelligence

Reinforcement Learning Dealing with Partial Observability

Reinforcement Nisheeth 18th January 2019.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 15: Partially Observable Markov Decision Processes (POMDPs) November 5, 2015 Dr.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536

POMDPs A special case of the Markov Decision Process (MDP). In an MDP, the environ-ment is fully observable, and with the Markov assumption for the transition model, the optimal policy depends only on the current state. For POMDPs, the environment is only partially observable

POMDP Implications Since current state is not necessarily known, agent cannot execute the optimal policy for the state. A POMDP is defined by the following: –Set of states S, set of actions A, set of observations O –Transition model T(s, a, s’) –Reward model R(s) –Observation model O(s, o) – probability of observing observation s in state o.

POMDP Implications (cont.) Optimal action depends not on current state but on agent’s current belief state. –Belief state is a probability distribution over all possible states Given a belief state, if agent does an action a and perceives observation o, new belief state is –b’(s’) = α O(s’, o) Σ T(s, a, s’) b(s) Optimal policy π * (s) maps from belief states to actions

POMDP Solutions Solving POMDP on a physical state space is equi- valent to solving an MDP on the belief state space However, state space is continuous and very high- dimensional, so solutions are difficult to compute. Even finding approximately optimal solutions is PSPACE-hard (i.e. really hard)

Why Study POMDPs? In spite of the difficulties, POMDPs are still very important. –Many real-world problems and situations are not fully observable, but the Markov assumption is often valid. Active area of research –Google search on “POMDP” returns ~5000 results –A number of current papers on the topic

Some Solution Techniques Most exact solution algorithms (value iteration, policy iteration ) use dynamic programming techniques –These techniques transform from one value function (the transition model in physical space, which is piecewise linear and convex - PWLC) to another that can be used in an MDP solution technique –Dynamic programming algorithms: one-pass (1971), exhaustive (1982), linear support (1988), witness (1996) –Better method – incremental pruning (1996)

POMDPs at Work Pattern Recognition tasks –SA-POMDP (Single-action POMDP) – only decision is whether to change state or not –Model constructed to recognize words within text to which noise was added – i.e. individual letters within the words were –SA-POMDP outperformed a pattern recognizer based on Hidden Markov Models, and exhibited better immunity to noise

POMDPs at Work (cont.) Robotics –Mission planning –Robot Navigation POMDP used to control the movement of an autonomous robot within a crowded environment Used to predict the motion of other objects within the robot’s environment Decompose state space into hierarchy, so individual POMDPs have a computationally tractable task

POMDPs at Work (cont.) BATmobile – the Bayesian Autonomous Taxi –Many different tasks make use of a number of AI techniques –POMDPs used for the actual driving control (as opposed to higher level trip planning) –To efficiently compute, uses approximation techniques

BAT (cont.) Several different techniques combined: –Dynamic Probabilistic Network (DPN) to maintain current belief state –Dynamic Decision Network (DDN) to perform bounded lookahead –Hand-coded explicit policy representations – i.e. decision trees –Supervised / reinforcement learning techniques to learn policy decisions

BAT (cont.) The BAT has been constructed in a simulation environment and has been demonstrated to successfully handle a variety of driving problems, such as passing slower vehicles, reacting to unsafe drivers, avoiding stalled vehicles, and merging into traffic.

Resources Tutorial on POMDPs: – orial/index.htmlhttp:// orial/index.html Additional pointers to articles on my web site: –