Presentation is loading. Please wait.

Presentation is loading. Please wait.

Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P.

Similar presentations


Presentation on theme: "Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P."— Presentation transcript:

1 Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P.

2 Content Background Review Q-learning Reinforcement learning on mobile robots Learning framework Experimental results Conclusion Discussion

3 Background Hard to code behaviour efficiently and correctly Reinforcement learning: tell the robot what to do, not how to do it How well suited is reinforcement learning for mobile robots?

4 Review Q-learning Discrete states s and actions a Learn value function by observing rewards – Actual function Q*(s,a) = E[R(s,a) +  max Q*(s’,a’)] – Learn by Q(s t,a t ) = (1-  ) Q(s t,a t ) +  (r t+1 +  max Q(s t+1,a’)) Sample distribution has no effect on learned policy  *(s) = argmax Q*(s,a)

5 Reinforcement learning on mobile robots Sparse reward function – Almost always zero reward R(s,a) – Non-zero reward only when on success or failure Continuous environment – HEDGER is used as a function approximator – Function approximation can be used when it never extrapolates from the data

6 Reinforcement learning on mobile robots Q-learning can only be successful when a state with positive reward can be found Sparse reward function and continuous environment cause reward states to be hard to find by trial and error Solution: show robot how to find the reward states

7 Learning framework Split learning into two phases: – Phase one: actions are controlled by exterior force, learning algorithm only passively observes – Phase two: learning algorithm learns optimal policy By ‘showing’ the robot where the interesting states are, learning should be quicker

8 Experimental setup Two experiments on B21r mobile robot – Movement speed is fixed by outside force – Rotation speed has to be learned – Settings  = 0.2,  = 0.99 or 0.90 Performance is measured after every 5 runs – Robot does not learn from these test – Starting position and orientation similar, not identical

9 Experimental Results: Corridor Following Task State space: – distance to end of corridor – distance to left wall as fraction of corridor width – angle  to target point

10 Experimental Results: Corridor Following Task Computer controlled teacher – Rotation speed is a fraction  of the angle 

11 Experimental Results: Corridor Following Task Human controlled teacher – Different corridor than computer controlled teacher

12 Experimental Results: Corridor Following Task Results Decrease in performance after training – Phase 2 supplies more novel experiences Sloppy human controller causes faster convergence than rigid computer controller – Fewer phase 1 and phase 2 runs – Human controller supplies more varied data

13 Experimental Results: Corridor Following Task Results Simulated performance without advantage of teacher examples

14 Experimental Results: Obstacle Avoidance Task State space: – direction and distance to obstacles – direction and distance to target

15 Experimental Results: Obstacle Avoidance Task Results Human controlled teacher – Robot starts 3m from target, random orientation

16 Experimental Results: Obstacle Avoidance Task Results Simulation without teacher examples – No obstacles present; robot only must reach goal – Simulated robot starts in the right orientation – 3 meters from target: 18.7% reached target in one week of simulated time, taking 6.54 hours on average

17 Conclusion Passive observation of appropriate state-action behaviour can speed up Q-learning Knowledge about the robot or the learning algorithm is not necessary Any solution will work, providing a good solution is not necessary

18 Discussion


Download ppt "Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P."

Similar presentations


Ads by Google