Presentation is loading. Please wait.

Presentation is loading. Please wait.

DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.

Similar presentations


Presentation on theme: "DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial."— Presentation transcript:

1 DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial Intelligence Laboratory MIT

2 DARPA Mobile Autonomous Robot SoftwareMay 2000 2 Progress to Date Fast bootstrapped reinforcement learning algorithmic techniques demo on robot Optical-flow based navigation flow algorithm implemented pilot navigation experiments on robot pilot navigation experiments in simulation testbed

3 DARPA Mobile Autonomous Robot SoftwareMay 2000 3 Making RL Really Work Typical RL methods require far too much data to be practical in an online setting. Address the problem by strong generalization techniques using human input to bootstrap Let humans do what they’re good at Let learning algorithms do what they’re good at

4 DARPA Mobile Autonomous Robot SoftwareMay 2000 4 JAQL Learning a value function in a continuous state and action space based on locally weighted regression (fancy version of nearest neighbor) algorithm knows what it knows use meta-knowledge to be conservative about dynamic-programming updates

5 DARPA Mobile Autonomous Robot SoftwareMay 2000 5 Problems with Q-Learning on Robots Huge state spaces/sparse data Continuous states and actions Slow to propagate values Safety during exploration Lack of initial knowledge

6 DARPA Mobile Autonomous Robot SoftwareMay 2000 6 Value Function Approximation Use a function approximator instead of a table generalization deals with continuous spaces and actions Q-learning with VFA has been shown to diverge, even in benign cases Which function approximator should we use to minimize problems? Q(s,a) s a F

7 DARPA Mobile Autonomous Robot SoftwareMay 2000 7 Locally Weighted Regression Store all previous data points Given a query point, find k nearest points Fit a locally linear model to these points, giving closer ones more weight Use KD-trees to make lookups more efficient Fast learning from a single data point

8 DARPA Mobile Autonomous Robot SoftwareMay 2000 8 Locally Weighted Regression Original function

9 DARPA Mobile Autonomous Robot SoftwareMay 2000 9 Locally Weighted Regression Bandwidth = 0.1, 500 training points

10 DARPA Mobile Autonomous Robot SoftwareMay 2000 10 Problems with Approximate Q-Learning Errors are amplified by backups

11 DARPA Mobile Autonomous Robot SoftwareMay 2000 11 One Source of Errors

12 DARPA Mobile Autonomous Robot SoftwareMay 2000 12 Independent Variable Hull Interpolation is safe; extrapolation is not, so construct hull around known points do local regression if the query point is within the hull give a default prediction if not

13 DARPA Mobile Autonomous Robot SoftwareMay 2000 13 Recap Use LWR to represent the value function generalization continuous spaces Use IVH and “don’t know” conservative predictions safer backups

14 DARPA Mobile Autonomous Robot SoftwareMay 2000 14 Incorporating Human Input Humans can help a lot, even if they can’t perform the task very well. Provide some initial successful trajectories through the space Trajectories are not used for supervised learning, but to guide the reinforcement-learning methods through useful parts of the space Learn models of the dynamics of the world and of the reward structure Once learned models are good, use them to update the value function and policy as well.

15 DARPA Mobile Autonomous Robot SoftwareMay 2000 15 Give Some Trajectories Supply an example policy Need not be optimal and might be very wrong Code or human-controlled Used to generate experience Follow example policy and record experiences Shows learner “interesting” parts of the space “Bad” initial policies might be better

16 DARPA Mobile Autonomous Robot SoftwareMay 2000 16 Two Learning Phases Learning System Supplied Control Policy Environment Phase One ARO

17 DARPA Mobile Autonomous Robot SoftwareMay 2000 17 Two Learning Phases Learning System Supplied Control Policy Environment Phase Two ARO

18 DARPA Mobile Autonomous Robot SoftwareMay 2000 18 What does this Give Us? Natural way to insert human knowledge Keeps robot safe in early stages of learning Bootstraps information into the Q-function

19 DARPA Mobile Autonomous Robot SoftwareMay 2000 19 Experimental Results: Corridor-Following

20 DARPA Mobile Autonomous Robot SoftwareMay 2000 20 Corridor-Following 3 continuous state dimensions corridor angle offset from middle distance to end of corridor 1 continuous action dimension rotation velocity Supplied example policy Average 110 steps to goal

21 DARPA Mobile Autonomous Robot SoftwareMay 2000 21 Corridor-Following Experimental setup Initial training runs start from roughly the middle of the corridor Translation speed has a fixed policy Evaluation on a number of set starting points Reward 10 at end of corridor 0 everywhere else

22 DARPA Mobile Autonomous Robot SoftwareMay 2000 22 Corridor-Following “Best” possible Average training Phase 1Phase 2

23 DARPA Mobile Autonomous Robot SoftwareMay 2000 23 Corridor Following: Initial Policy

24 DARPA Mobile Autonomous Robot SoftwareMay 2000 24 Corridor Following: After Phase 1

25 DARPA Mobile Autonomous Robot SoftwareMay 2000 25 Corridor Following: After Phase 1

26 DARPA Mobile Autonomous Robot SoftwareMay 2000 26 Corridor Following: After Phase 2

27 DARPA Mobile Autonomous Robot SoftwareMay 2000 27 Conclusions VFA can be made more stable Locally weighted regression Independent variable hull Conservative backups Bootstrapping value function really helps Initial supplied trajectories Two learning phases

28 DARPA Mobile Autonomous Robot SoftwareMay 2000 28 Optical Flow Get range information visually by computing optical flow field nearer objects cause flow of higher magnitude expansion pattern means you’re going to hit rate of expansion tells you when elegant control laws based on center and rate of expansion (derived from human and fly behavior)

29 DARPA Mobile Autonomous Robot SoftwareMay 2000 29 Approaching a Wall

30 DARPA Mobile Autonomous Robot SoftwareMay 2000 30 Balance Strategy Simple obstacle-avoidance strategy compute flow field compute average magnitude of flow in each hemi- field turn away from the side with higher magnitude (because it has closer objects)

31 DARPA Mobile Autonomous Robot SoftwareMay 2000 31 Balance Strategy in Action

32 DARPA Mobile Autonomous Robot SoftwareMay 2000 32 Crystal Space

33 DARPA Mobile Autonomous Robot SoftwareMay 2000 33 Crystal Space

34 DARPA Mobile Autonomous Robot SoftwareMay 2000 34 Crystal Space

35 DARPA Mobile Autonomous Robot SoftwareMay 2000 35 Next Steps Extend RL architecture to include model-learning and planning Apply RL techniques to tune parameters in optical- flow Build topological maps using visual information Build highly complex simulated environment Integrate planning and learning in multi-layer system


Download ppt "DARPA Mobile Autonomous Robot SoftwareMay 2000 1 Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial."

Similar presentations


Ads by Google