Presentation is loading. Please wait.

Presentation is loading. Please wait.

8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT.

Similar presentations


Presentation on theme: "8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT."— Presentation transcript:

1 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

2 8/9/20152 DARPA-MARS Kickoff Two projects Making reinforcement learning work on real robots Solving huge problems  dynamic problem reformulation  explicit uncertainty management

3 8/9/20153 DARPA-MARS Kickoff Reinforcement learning given a connection to the environment find a behavior that maximizes long-run reinforcement Reinf Environment Action Observation

4 8/9/20154 DARPA-MARS Kickoff Why reinforcement learning? Unknown or changing environments Easier for human to provide reinforcement function than whole behavior

5 8/9/20155 DARPA-MARS Kickoff Q-Learning Learn to choose actions because of their long-term consequences Given experience: Given a state s, take the action a that maximizes

6 8/9/20156 DARPA-MARS Kickoff Does it Work? Yes and no. Successes in simulated domains: backgammon, elevator scheduling Successes in manufacturing and juggling with strong constraints No strong successes in more general online robotic learning

7 8/9/20157 DARPA-MARS Kickoff Why is RL on robots hard? Need fast, robust supervised learning Continuous input and action spaces Q-learning slow to propagate values Need strong exploration bias

8 8/9/20158 DARPA-MARS Kickoff Making RL on robots easier Need fast, robust supervised learning  locally weighted regression Continuous input and action spaces  search and caching of optimal action Q-learning slow to propagate values  model-based acceleration Need strong exploration bias  start with human-supplied policy

9 8/9/20159 DARPA-MARS Kickoff Human Policy Start with human-provided policy Environment action state

10 8/9/201510 DARPA-MARS Kickoff Do supervised policy learning Human Policy Train Environment Policy action state sa

11 8/9/201511 DARPA-MARS Kickoff When the policy is learned, let it drive Human Policy Train Environment Policy action state

12 8/9/201512 DARPA-MARS Kickoff Q-Learning Train Environment Q-Value RL Policy action state D s a v

13 8/9/201513 DARPA-MARS Kickoff Acting based on Q values Q-Value max index a1a1 a2a2 anan a s

14 8/9/201514 DARPA-MARS Kickoff Letting the Q-learner drive Train Environment RL Policy action state D Q-Value s a v max

15 8/9/201515 DARPA-MARS Kickoff Train policy with max Q values Train Environment RL Policy action state D Q-Value s a v max s’

16 8/9/201516 DARPA-MARS Kickoff Add model learning Train Model Environment Q-Value RL Policy action state D s ss a a r v

17 8/9/201517 DARPA-MARS Kickoff Train Model Environment Q-Value RL Policy action state D s a v When model is good, train Q with it s’ a’

18 8/9/201518 DARPA-MARS Kickoff Other forms of human knowledge hard safety constraints on action choices partial models or constraints on models value estimates or value orderings on states

19 8/9/201519 DARPA-MARS Kickoff We will have succeeded if It takes less human effort and total development time to provide prior knowledge run and tune the learning algorithm than to write and debug the program without learning

20 8/9/201520 DARPA-MARS Kickoff Test domain Indoor mobile-robot navigation and delivery tasks quick adaptation to new buildings quick adaptation to sensor change or failure quick incorporation of human information

21 8/9/201521 DARPA-MARS Kickoff Solving huge problems We have lots of good techniques for small-to-medium sized problems reinforcement learning probabilistic planning Bayesian inference Rather than scale them to tackle huge problems directly, formulate right-sized problems on the fly

22 8/9/201522 DARPA-MARS Kickoff Dynamic problem reformulation working memory perceptionaction

23 8/9/201523 DARPA-MARS Kickoff Reformulation strategy Dynamically swap variables in and out of working memory constant sized problem always tractable adapt to changing situations, goals, etc Given more time pressure, decrease problem size Given less time pressure, increase problem size

24 8/9/201524 DARPA-MARS Kickoff Multiple-resolution plans Fine view of near-term high-probability events Coarse view of distant low-probability events

25 8/9/201525 DARPA-MARS Kickoff Information gathering Explicit models of the robot’s uncertainty allow information gathering actions drive to top of hill for better view open a door to see what’s inside ask a human for guidance Where is the supply depot? Two miles up this road

26 8/9/201526 DARPA-MARS Kickoff Explicit uncertainty modeling POMDP work gives us theoretical understanding Derive practical solutions from learning explicit memorization policies approximating optimal control

27 8/9/201527 DARPA-MARS Kickoff Huge-domain experiments Simulation of very complex task environment large number of buildings and other geographical structures concurrent, competing tasks such as  surveillance  supply delivery  self-preservation other agents from whom information can be gathered


Download ppt "8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT."

Similar presentations


Ads by Google