LCSLCS 18 September 2002DARPA MARS PI Meeting Intelligent Adaptive Mobile Robots Georgios Theocharous MIT AI Laboratory with Terran Lane and Leslie Pack.

Slides:



Advertisements
Similar presentations
Hierarchical Reinforcement Learning Amir massoud Farahmand
Advertisements

Markov Decision Process
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Partially Observable Markov Decision Process (POMDP)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
Decision Theoretic Planning
DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.
A Hierarchical Framework for Composing Nested Web Processes Haibo Zhao, Prashant Doshi LSDIS Lab, Dept. of Computer Science, University of Georgia 4 th.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
Infinite Horizon Problems
Planning under Uncertainty
Reinforcement Learning
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
CPSC 322, Lecture 17Slide 1 Planning: Representation and Forward Search Computer Science cpsc322, Lecture 17 (Textbook Chpt 8.1 (Skip )- 8.2) February,
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Markov Decision Processes
Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Department of Computer Science Undergraduate Events More
Department of Computer Science Undergraduate Events More
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Dynamic Bayesian Networks CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanningLearning.
Making Decisions CSE 592 Winter 2003 Henry Kautz.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
MAKING COMPLEX DEClSlONS
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)
Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
1 Factored MDPs Alan Fern * * Based in part on slides by Craig Boutilier.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
Haley: A Hierarchical Framework for Logical Composition of Web Services Haibo Zhao, Prashant Doshi LSDIS Lab, Dept. of Computer Science, University of.
Department of Computer Science Undergraduate Events More
Solving POMDPs through Macro Decomposition
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.
Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect ,Chap. 17 CS121 – Winter 2003.
Representing hierarchical POMDPs as DBNs for multi-scale robot localization G. Thocharous, K. Murphy, L. Kaelbling Presented by: Hannaneh Hajishirzi.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.
Announcements  Upcoming due dates  Wednesday 11/4, 11:59pm Homework 8  Friday 10/30, 5pm Project 3  Watch out for Daylight Savings and UTC.
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
Department of Computer Science Undergraduate Events More
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Markov Decision Processes
Markov Decision Processes
Announcements Homework 3 due today (grace period through Friday)
Hierarchical POMDP Solutions
13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel
CS 416 Artificial Intelligence
Reinforcement Learning (2)
Deciding Under Probabilistic Uncertainty
Markov Decision Processes
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Markov Decision Processes
Reinforcement Learning (2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

LCSLCS 18 September 2002DARPA MARS PI Meeting Intelligent Adaptive Mobile Robots Georgios Theocharous MIT AI Laboratory with Terran Lane and Leslie Pack Kaelbling (PI)

LCSLCS 18 September 2002DARPA MARS PI Meeting Project Overview Hierarchical POMDP Models Parameter learning; heuristic action-selection (Theocharous) Near-optimal action-selection; Hierarchical structure learning; (Theocharous, Murphy) Near-deterministic abstractions for MDPs (Lane) Near-deterministic abstractions for POMDPs (Theocharous, Lane) Visual map building Optic flow-based navigation (Temizer, Steinkraus) Automatic spatial decomposition (Temizer) Selecting visual landmarks (Temizer) Enormous simulated robotic domain (Steinkraus) Learning low-level behaviors with human training (Smart)

LCSLCS 18 September 2002DARPA MARS PI Meeting Today’s talk Hierarchical POMDP Models Parameter learning; heuristic action-selection (Theocharous) Near-optimal action-selection; Hierarchical structure learning; (Theocharous, Murphy) Near-deterministic abstractions for MDPs (Lane) Near-deterministic abstractions for POMDPs (Theocharous, Lane) Visual map building Optic flow-based navigation (Temizer, Steinkraus) Automatic spatial decomposition (Temizer) Selecting visual landmarks (Temizer) Enormous simulated robotic domain (Steinkraus) Learning low-level behaviors with human training (Smart)

LCSLCS 18 September 2002DARPA MARS PI Meeting The Problem How to sequentially select actions in a very large uncertain domain? Markov decision processes are a good formalization for uncertain planning Optimization algorithms for MDPs are polynomial in the size of the state space which is exponential in the number of state variables!!

LCSLCS 18 September 2002DARPA MARS PI Meeting Abstraction and Decomposition Our only hope is to divide and conquer state abstraction: treat sets of states as if they were the same state decomposition: solve restricted problems in sub- parts of the state space action abstraction: treat sequences of actions as if they were atomic teleological abstraction: goal based abstraction

LCSLCS 18 September 2002DARPA MARS PI Meeting Hierarchical POMDPs + ACTIONS abstract states product states, which generate observations entry states exit states vertical transitions horizontal transitions

LCSLCS 18 September 2002DARPA MARS PI Meeting Representing Spatial Environments as HPOMDPs States Vertical transitions Horizontal transitions Obser. models The approach was easy to transfer from the MSU Engineering Building to the 7 th floor of the AI lab

LCSLCS 18 September 2002DARPA MARS PI Meeting Hierarchical Learning and Planning in Partially Observable Markov Decision Processes for Robot Navigation (Theocharous Ph.D. Thesis MSU 2002) Derived the HPOMDP model and learning/planning algorithms Learning HPOMDPs for robot navigation Exact and approximate algorithms provide: Faster convergence/ Less time per iteration/ Better learned models/ Better robot localization /Ability to infer structure Planning with HPOMDPs for robot navigation Spatial and temporal abstraction Robot computes plans faster /Able to get to locations in the environment starting from unknown location/ Performance scales gracefully to large scale environments

LCSLCS 18 September 2002DARPA MARS PI Meeting POMDP macro-actions can be formulated as policy graphs (Theocharous, Terran) POMDP-macro-actions can be represented as policy graphs with some termination condition Termination condition

LCSLCS 18 September 2002DARPA MARS PI Meeting Reward and transition models of macro- actions Expected sum of discounted reward for executing controller C from hidden state s and memory state x until termination. In a similar manner we can compute the transition models F corr else x1x2 S2S1S3 S5 S6 S7

LCSLCS 18 September 2002DARPA MARS PI Meeting Planning and Execution Planning: (Solve it as an MDP) States : The hidden states Actions: Pairs of controller, memory states (C,x) Reward and transition models (as defined before) Execution: 1-step look ahead (transition models allow us to predict the belief state far into the future) Take the actions that maximizes the immediate reward from current belief states plus discounted value of next belief state

LCSLCS 18 September 2002DARPA MARS PI Meeting Robot navigation experiments F All O R L F corr else Robot successfully navigates from uniform initial beliefs to goal locations

LCSLCS 18 September 2002DARPA MARS PI Meeting Structure Learning & DBN Representations (Theocharous, Murphy) We are investigating methods for structure learning of HHMM/HPOMDPs (hierarchy depth, number of states, reusable substructures) Bayesian model selection methods Approaches for learning compositional hierarchies (e.g., recurrent neural networks, sparse hierarchical n-grams) Other Language acquisition approaches (e.g., Carl de Marcken) DBN representation of HPOMDPs Linear time training Extensions to online-training Factorial HPOMDPs Sampling techniques for fast inference

LCSLCS 18 September 2002DARPA MARS PI Meeting Near Determinism (Lane) Some common action abstractions put it in the bag go to the conference room take out the trash What’s important about them? even if the world is highly stochastic, you can very nearly guarantee their success Encapsulate uncertainty at the lower level of abstraction

LCSLCS 18 September 2002DARPA MARS PI Meeting Example domain 10 Floors ~1800 locations per floor 45 mail drops per floor Limited battery 11 actions Total: |S|>2 500 states

LCSLCS 18 September 2002DARPA MARS PI Meeting A simple example State space: X Y b (reached goal) Actions: N, S, E, W Transition function: Noise, walls Rewards: -  /step until b =1 0 thereafter x y

LCSLCS 18 September 2002DARPA MARS PI Meeting Macros deliver single packages Macro is a plan over a restricted state space Defines how to achieve one goal from any location Terminates at any goal Can be found quickly Encapsulates uncertainty Goal b2

LCSLCS 18 September 2002DARPA MARS PI Meeting Combining Macros Formally: solve semi-MDP over {b} k Gets all macro interactions & probs right Still exponential, though… These macros are close to deterministic Low prob. of delivering wrong package Macros form graph over {b 1 … b k } Reduce SMDP to graph optimization problem

LCSLCS 18 September 2002DARPA MARS PI Meeting Solve deterministic graph problem

LCSLCS 18 September 2002DARPA MARS PI Meeting But does it work? Yes! (Well, in simulation, anyway…) Small, randomly generated scenarios Up to ~60k states (  6 packages) Optimal solution directly 5.8% error on avg Larger scenarios, based on one-floor building model Up to ~2 55 states (~45 packages) Can’t get optimal solution 600 trajectories; no macro failures

LCSLCS 18 September 2002DARPA MARS PI Meeting Near Determinism in POMDPs (Terran, Theocharous) Current research project: near-deterministic abstraction in POMDPs macros map belief states to actions choose macros that reliably achieve subsets of belief states “dovetailing”

LCSLCS 18 September 2002DARPA MARS PI Meeting Really Big Domain (Steinkraus)

LCSLCS 18 September 2002DARPA MARS PI Meeting Working in Huge Domains Continually remap the huge problem to smaller subproblems of current import Decompose along lines of utility function; recombine solutions Track belief state approximately; devote more bits to currently important aspects of the problem, and those predicted to be important in the future Adjust belief representation dynamically