8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT.

Slides:



Advertisements
Similar presentations
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Advertisements

Background Reinforcement Learning (RL) agents learn to do tasks by iteratively performing actions in the world and using resulting experiences to decide.
David Wingate Reinforcement Learning for Complex System Management.
Reinforcement Learning
Reinforcement learning (Chapter 21)
DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.
LCSLCS 18 September 2002DARPA MARS PI Meeting Intelligent Adaptive Mobile Robots Georgios Theocharous MIT AI Laboratory with Terran Lane and Leslie Pack.
Partially Observable Markov Decision Process By Nezih Ergin Özkucur.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Apprenticeship learning for robotic control Pieter Abbeel Stanford University Joint work with Andrew Y. Ng, Adam Coates, Morgan Quigley.
Reinforcement learning
Reinforcement Learning
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
Practical Reinforcement Learning in Continuous Space William D. Smart Brown University Leslie Pack Kaelbling MIT Presented by: David LeRoux.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
Ratbert: Nearest Sequence Memory Based Prediction Model Applied to Robot Navigation by Sergey Alexandrov iCML 2003.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Reward Functions for Accelerated Learning Presented by Alp Sardağ.
Algorithms For Inverse Reinforcement Learning Presented by Alp Sardağ.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Reinforcement Learning (1)
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.
Reinforcement Learning
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
Introduction Many decision making problems in real life
Reinforcement Learning
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
CHAPTER 10 Reinforcement Learning Utility Theory.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.
Attributions These slides were originally developed by R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction. (They have been reformatted.
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
1 Introduction to Reinforcement Learning Freek Stulp.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
MDPs (cont) & Reinforcement Learning
Learning in the Large Information Processing Technology Office Learning Workshop April 12, 2004 Seedling Overview Learning in the Large MIT CSAIL PIs:
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 20: Approximate & Neuro Dynamic Programming, Policy Gradient Methods Dr. Itamar Arel.
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Reinforcement learning (Chapter 21)
Selection of Behavioral Parameters: Integration of Case-Based Reasoning with Learning Momentum Brian Lee, Maxim Likhachev, and Ronald C. Arkin Mobile Robot.
Reinforcement Learning
Reinforcement Learning Based on slides by Avi Pfeffer and David Parkes.
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
MIT Artificial Intelligence Laboratory — Research Directions Intelligent Agents that Learn Leslie Pack Kaelbling.
Urban Planning Group Implementation of a Model of Dynamic Activity- Travel Rescheduling Decisions: An Agent-Based Micro-Simulation Framework Theo Arentze,
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
CS 541: Artificial Intelligence Lecture XI: Reinforcement Learning Slides Credit: Peter Norvig and Sebastian Thrun.
NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 1 Learning in Worlds with Objects Leslie Pack Kaelbling MIT Artificial Intelligence Laboratory.
Chapter 11: Artificial Intelligence
Reinforcement learning (Chapter 21)
Intelligent Adaptive Mobile Robots
Joelle Pineau: General info
Robust Belief-based Execution of Manipulation Programs
Announcements Homework 3 due today (grace period through Friday)
Reinforcement Learning with Partially Known World Dynamics
Hierarchical POMDP Solutions
Learning in Worlds with Objects
CS 416 Artificial Intelligence
Presentation transcript:

8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

8/9/20152 DARPA-MARS Kickoff Two projects Making reinforcement learning work on real robots Solving huge problems  dynamic problem reformulation  explicit uncertainty management

8/9/20153 DARPA-MARS Kickoff Reinforcement learning given a connection to the environment find a behavior that maximizes long-run reinforcement Reinf Environment Action Observation

8/9/20154 DARPA-MARS Kickoff Why reinforcement learning? Unknown or changing environments Easier for human to provide reinforcement function than whole behavior

8/9/20155 DARPA-MARS Kickoff Q-Learning Learn to choose actions because of their long-term consequences Given experience: Given a state s, take the action a that maximizes

8/9/20156 DARPA-MARS Kickoff Does it Work? Yes and no. Successes in simulated domains: backgammon, elevator scheduling Successes in manufacturing and juggling with strong constraints No strong successes in more general online robotic learning

8/9/20157 DARPA-MARS Kickoff Why is RL on robots hard? Need fast, robust supervised learning Continuous input and action spaces Q-learning slow to propagate values Need strong exploration bias

8/9/20158 DARPA-MARS Kickoff Making RL on robots easier Need fast, robust supervised learning  locally weighted regression Continuous input and action spaces  search and caching of optimal action Q-learning slow to propagate values  model-based acceleration Need strong exploration bias  start with human-supplied policy

8/9/20159 DARPA-MARS Kickoff Human Policy Start with human-provided policy Environment action state

8/9/ DARPA-MARS Kickoff Do supervised policy learning Human Policy Train Environment Policy action state sa

8/9/ DARPA-MARS Kickoff When the policy is learned, let it drive Human Policy Train Environment Policy action state

8/9/ DARPA-MARS Kickoff Q-Learning Train Environment Q-Value RL Policy action state D s a v

8/9/ DARPA-MARS Kickoff Acting based on Q values Q-Value max index a1a1 a2a2 anan a s

8/9/ DARPA-MARS Kickoff Letting the Q-learner drive Train Environment RL Policy action state D Q-Value s a v max

8/9/ DARPA-MARS Kickoff Train policy with max Q values Train Environment RL Policy action state D Q-Value s a v max s’

8/9/ DARPA-MARS Kickoff Add model learning Train Model Environment Q-Value RL Policy action state D s ss a a r v

8/9/ DARPA-MARS Kickoff Train Model Environment Q-Value RL Policy action state D s a v When model is good, train Q with it s’ a’

8/9/ DARPA-MARS Kickoff Other forms of human knowledge hard safety constraints on action choices partial models or constraints on models value estimates or value orderings on states

8/9/ DARPA-MARS Kickoff We will have succeeded if It takes less human effort and total development time to provide prior knowledge run and tune the learning algorithm than to write and debug the program without learning

8/9/ DARPA-MARS Kickoff Test domain Indoor mobile-robot navigation and delivery tasks quick adaptation to new buildings quick adaptation to sensor change or failure quick incorporation of human information

8/9/ DARPA-MARS Kickoff Solving huge problems We have lots of good techniques for small-to-medium sized problems reinforcement learning probabilistic planning Bayesian inference Rather than scale them to tackle huge problems directly, formulate right-sized problems on the fly

8/9/ DARPA-MARS Kickoff Dynamic problem reformulation working memory perceptionaction

8/9/ DARPA-MARS Kickoff Reformulation strategy Dynamically swap variables in and out of working memory constant sized problem always tractable adapt to changing situations, goals, etc Given more time pressure, decrease problem size Given less time pressure, increase problem size

8/9/ DARPA-MARS Kickoff Multiple-resolution plans Fine view of near-term high-probability events Coarse view of distant low-probability events

8/9/ DARPA-MARS Kickoff Information gathering Explicit models of the robot’s uncertainty allow information gathering actions drive to top of hill for better view open a door to see what’s inside ask a human for guidance Where is the supply depot? Two miles up this road

8/9/ DARPA-MARS Kickoff Explicit uncertainty modeling POMDP work gives us theoretical understanding Derive practical solutions from learning explicit memorization policies approximating optimal control

8/9/ DARPA-MARS Kickoff Huge-domain experiments Simulation of very complex task environment large number of buildings and other geographical structures concurrent, competing tasks such as  surveillance  supply delivery  self-preservation other agents from whom information can be gathered