Classical Situation hellheaven World deterministic State observable.

Slides:



Advertisements
Similar presentations
Markov Decision Process
Advertisements

Partially Observable Markov Decision Process (POMDP)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Fast approximate POMDP planning: Overcoming the curse of history! Joelle Pineau, Geoff Gordon and Sebastian Thrun, CMU Point-based value iteration: an.
Meeting 3 POMDP (Partial Observability MDP) 資工四 阮鶴鳴 李運寰 Advisor: 李琳山教授.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
SA-1 Probabilistic Robotics Tutorial AAAI-2000 Sebastian Thrun Computer Science and Robotics Carnegie Mellon University.
An Introduction to Markov Decision Processes Sarah Hickmott
Partially Observable Markov Decision Process By Nezih Ergin Özkucur.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Planning under Uncertainty
POMDPs: Partially Observable Markov Decision Processes Advanced AI
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.
Markov Decision Processes CSE 473 May 28, 2004 AI textbook : Sections Russel and Norvig Decision-Theoretic Planning: Structural Assumptions.
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
Machine LearningRL1 Reinforcement Learning in Partially Observable Environments Michael L. Littman.
An Introduction to PO-MDP Presented by Alp Sardağ.
Discretization Pieter Abbeel UC Berkeley EECS
Markov Decision Processes
Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Based on slides by Nicholas Roy, MIT Finding Approximate POMDP Solutions through Belief Compression.
Instructor: Vincent Conitzer
Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.
1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested ( to me before class)  Can use your own.
Decision Making Under Uncertainty Lec #7: Markov Decision Processes UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Craig.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 6.2: Kalman Filter Jürgen Sturm Technische Universität München.
Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)
Dynamic Programming for Partially Observable Stochastic Games Daniel S. Bernstein University of Massachusetts Amherst in collaboration with Christopher.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.
CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Privacy-Preserving Bayes-Adaptive MDPs CS548 Term Project Kanghoon Lee, AIPR Lab., KAIST CS548 Advanced Information Security Spring 2010.
Solving POMDPs through Macro Decomposition
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
Reinforcement Learning
Reinforcement Learning Yishay Mansour Tel-Aviv University.
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
INTRODUCTION TO Machine Learning
MDPs (cont) & Reinforcement Learning
Heuristic Search for problems with uncertainty CSE 574 April 22, 2003 Mausam.
CSE 473Markov Decision Processes Dan Weld Many slides from Chris Bishop, Mausam, Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer.
Planning in Information Space for a Quad-rotor Helicopter Ruijie He Thesis Advisor: Prof. Nicholas Roy.
Markov Decision Processes Chapter 17 Mausam. Planning Agent What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable.
Partial Observability “Planning and acting in partially observable stochastic domains” Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra;
CS 541: Artificial Intelligence Lecture X: Markov Decision Process Slides Credit: Peter Norvig and Sebastian Thrun.
Reinforcement learning
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Markov Decision Processes
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Markov Decision Processes
Hierarchical POMDP Solutions
Approximate POMDP planning: Overcoming the curse of history!
Instructor: Vincent Conitzer
Chapter 10: Dimensions of Reinforcement Learning
Chapter 17 – Making Complex Decisions
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Classical Situation hellheaven World deterministic State observable

MDP-Style Planning hellheaven World stochastic State observable [Koditschek 87, Barto et al. 89] Policy Universal Plan Navigation function

Stochastic, Partially Observable sign hell?heaven? [Sondik 72] [Littman/Cassandra/Kaelbling 97]

Stochastic, Partially Observable sign hellheaven sign heavenhell

Stochastic, Partially Observable sign heavenhell sign ?? hellheaven start 50%

Robot Planning Frameworks Classical AI/robot planning State/actionsdiscrete & continuous Stateobservable Environmentdeterministic PlansSequences of actions CompletenessYes OptimalityRarely State space size Huge, often continuous, 6 dimensions Computationa l Complexity varies

MDP-Style Planning hellheaven World stochastic State observable [Koditschek 87, Barto et al. 89] Policy Universal Plan Navigation function

Markov Decision Process (discrete) s2s2 s3s3 s4s4 s5s5 s1s r=  10 r=  0  r=0 r=1 r=0 [Bellman 57] [Howard 60] [Sutton/Barto 98]

Value Iteration Value function of policy  Bellman equation for optimal value function Value iteration: recursively estimating value function Greedy policy: [Bellman 57] [Howard 60] [Sutton/Barto 98]

Value Iteration for Motion Planning (assumes knowledge of robot’s location)

Continuous Environments From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

Approximate Cell Decomposition [Latombe 91] From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

Parti-Game [Moore 96] From: A Moore & C.G. Atkeson “The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Continuous State spaces,” Machine Learning 1995

Robot Planning Frameworks Classical AI/robot planning Value Iteration in MDPs Parti-Game State/actionsdiscrete & continuous discretecontinuous Stateobservable Environmentdeterministicstochastic PlansSequences of actions policy CompletenessYes OptimalityRarelyYesNo State space size Huge, often continuous, 6 dimensions millionsn/a Computationa l Complexity variesquadraticn/a

Stochastic, Partially Observable sign ?? start sign heavenhell sign hellheaven 50% sign ?? start

A Quiz  -dim continuous* stochastic 1-dim continuous stochastic actions# statessize belief space?sensors 3: s 1, s 2, s 3 deterministic3perfect 3: s 1, s 2, s 3 stochastic3perfect : s 1, s 2, s 3, s 12, s 13, s 23, s 123 deterministic3 abstract states deterministic3stochastic 2-dim continuous*: p ( S=s 1 ), p ( S=s 2 ) stochastic3none 2-dim continuous*: p ( S=s 1 ), p ( S=s 2 ) *) countable, but for all practical purposes  -dim continuous* deterministic 1-dim continuous stochastic aargh!stochastic  -dim continuous stochastic

Introduction to POMDPs (1 of 3) 80  100 ba  0 ba  40 s2s2 s1s1 action a action b p(s1)p(s1) [Sondik 72, Littman, Kaelbling, Cassandra ‘97] s2s2 s1s1  action aaction b

Introduction to POMDPs (2 of 3) 80  100 ba  0 ba  40 s2s2 s1s1 80% c 20% p(s1)p(s1) s2s2 s1’s1’ s1s1 s2’s2’ p(s 1 ’) p(s1)p(s1) s2s2 s1s1  [Sondik 72, Littman, Kaelbling, Cassandra ‘97]

Introduction to POMDPs (3 of 3) 80  100 ba  0 ba  40 s2s2 s1s1 80% c 20% p(s1)p(s1) s2s2 s1s1  p(s1)p(s1) s2s2 s1s1 s1s1 s2s2 p(s 1 ’|A) B A 50% 30% 70% B A p(s 1 ’|B) [Sondik 72, Littman, Kaelbling, Cassandra ‘97]

Value Iteration in POMDPs Value function of policy  Bellman equation for optimal value function Value iteration: recursively estimating value function Greedy policy: Substitute b for s

Missing Terms: Belief Space Expected reward: Next state density: Bayes filters! (Dirac distribution)

Value Iteration in Belief Space.... next belief state b’ observation o.... belief state b max Q(b’, a) next state s’, reward r’state s Q(b, a) value function

Why is This So Complex? State Space Planning (no state uncertainty) Belief Space Planning (full state uncertainties) ?

Augmented MDPs: [Roy et al, 98/99] conventional state space uncertainty (entropy)

Path Planning with Augmented MDPs information gainConventional plannerProbabilistic Planner [Roy et al, 98/99]

Robot Planning Frameworks Classical AI/robot planning Value Iteration in MDPs Parti-GamePOMDPAugmented MDP State/actionsdiscrete & continuous discretecontinuousdiscrete Stateobservable partially observable Environmentdeterministicstochastic PlansSequences of actions policy CompletenessYes No OptimalityRarelyYesNoYesNo State space size Huge, often continuous, 6 dimensions millionsn/adozensthousands Computationa l Complexity variesquadraticn/aexponentialO(N 4 )