Presentation is loading. Please wait.

Presentation is loading. Please wait.

What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.

Similar presentations


Presentation on theme: "What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536."— Presentation transcript:

1 What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536

2 POMDPs A special case of the Markov Decision Process (MDP). In an MDP, the environ-ment is fully observable, and with the Markov assumption for the transition model, the optimal policy depends only on the current state. For POMDPs, the environment is only partially observable

3 POMDP Implications Since current state is not necessarily known, agent cannot execute the optimal policy for the state. A POMDP is defined by the following: –Set of states S, set of actions A, set of observations O –Transition model T(s, a, s’) –Reward model R(s) –Observation model O(s, o) – probability of observing observation s in state o.

4 POMDP Implications (cont.) Optimal action depends not on current state but on agent’s current belief state. –Belief state is a probability distribution over all possible states Given a belief state, if agent does an action a and perceives observation o, new belief state is –b’(s’) = α O(s’, o) Σ T(s, a, s’) b(s) Optimal policy π * (s) maps from belief states to actions

5 POMDP Solutions Solving POMDP on a physical state space is equi- valent to solving an MDP on the belief state space However, state space is continuous and very high- dimensional, so solutions are difficult to compute. Even finding approximately optimal solutions is PSPACE-hard (i.e. really hard)

6 Why Study POMDPs? In spite of the difficulties, POMDPs are still very important. –Many real-world problems and situations are not fully observable, but the Markov assumption is often valid. Active area of research –Google search on “POMDP” returns ~5000 results –A number of current papers on the topic

7 Some Solution Techniques Most exact solution algorithms (value iteration, policy iteration ) use dynamic programming techniques –These techniques transform from one value function (the transition model in physical space, which is piecewise linear and convex - PWLC) to another that can be used in an MDP solution technique –Dynamic programming algorithms: one-pass (1971), exhaustive (1982), linear support (1988), witness (1996) –Better method – incremental pruning (1996)

8 POMDPs at Work Pattern Recognition tasks –SA-POMDP (Single-action POMDP) – only decision is whether to change state or not –Model constructed to recognize words within text to which noise was added – i.e. individual letters within the words were –SA-POMDP outperformed a pattern recognizer based on Hidden Markov Models, and exhibited better immunity to noise

9 POMDPs at Work (cont.) Robotics –Mission planning –Robot Navigation POMDP used to control the movement of an autonomous robot within a crowded environment Used to predict the motion of other objects within the robot’s environment Decompose state space into hierarchy, so individual POMDPs have a computationally tractable task

10 POMDPs at Work (cont.) BATmobile – the Bayesian Autonomous Taxi –Many different tasks make use of a number of AI techniques –POMDPs used for the actual driving control (as opposed to higher level trip planning) –To efficiently compute, uses approximation techniques

11 BAT (cont.) Several different techniques combined: –Dynamic Probabilistic Network (DPN) to maintain current belief state –Dynamic Decision Network (DDN) to perform bounded lookahead –Hand-coded explicit policy representations – i.e. decision trees –Supervised / reinforcement learning techniques to learn policy decisions

12 BAT (cont.) The BAT has been constructed in a simulation environment and has been demonstrated to successfully handle a variety of driving problems, such as passing slower vehicles, reacting to unsafe drivers, avoiding stalled vehicles, and merging into traffic.

13 Resources Tutorial on POMDPs: –http://www.cs.brown.edu/research/ai/pomdp/tut orial/index.htmlhttp://www.cs.brown.edu/research/ai/pomdp/tut orial/index.html Additional pointers to articles on my web site: –http://www.cs.montana.edu/~bwall/cs536


Download ppt "What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536."

Similar presentations


Ads by Google