Presentation is loading. Please wait.

Presentation is loading. Please wait.

L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Learning Prospective Robot Behavior Shichao.

Similar presentations


Presentation on theme: "L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Learning Prospective Robot Behavior Shichao."— Presentation transcript:

1 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Learning Prospective Robot Behavior Shichao Ou and Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst

2 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE A Developmental Approach Infant Learning –In stages Maturation processes –Parents provide constrained learning contexts Protect Easy  Complex –Motion mobile for newborns –Use brightly colored, easy to pick up objects –Use building blocks –Association of words and objects

3 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Application in Robotics Framework for Robot Developmental Learning –Role of teacher: setup learning contexts that make target concept conspicuous –Role of robot: acquire concepts, generalize to new contexts by autonomous exploration, provide feedback Control Basis –Robot actions are created using combinations of –Establish stages of learning by time-varying constraints on resources Easy  Complex

4 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Example Learning to Reach for Objects –Stage 1: SearchTrack Focus attention using single brightly colored object (σ) Limit DOF (τ) to use head ONLY –Stage 2: ReachGrab Limit DOF (τ) to use one arm ONLY –Stage 3: Handedness, Scale- Sensitive Hart et. al, 2008

5 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Prospective Learning Infant adapts to new situations by prospectively look ahead and predict failure and then learn a repair strategy

6 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Robot Prospective Learning with Human Guidance S0S0 S1S1 SiSi SnSn SjSj a0a0 a1a1 a i-1 aiai a j-1 ajaj a n-1 S0S0 S1S1 SiSi SnSn SjSj S i1 S in S ij sub-task a0a0 a1a1 a i-1 aiai a j-1 ajaj a n-1 S0S0 S1S1 SiSi SnSn SjSj g(f)=1 g(f)=0 a0a0 a1a1 a i-1 aiai a j-1 ajaj a n-1 Challenge

7 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE A 2D Navigation Domain Problem 30x30 map 6 doors, randomly closed 6 buttons 1 start and 1 goal 3-bit door sensor on robot

8 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Flat Learning Results Flat Q-Learning –5-bit state (x,y, door-bit1, door-bit2, door-bit3) –4 actions up, down, left, right –Reward 1 for reaching the goal -0.01 for every step taken –Learning parameter α=0.1, γ=1.0, ε=0.1 Learned solutions after 30,000 episodes

9 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Prospective Learning Stage 1 –All doors open –Constrain resources to use only (x,y) sensors –Allow agent learn a policy from start to goal S0S0 S1S1 SiSi SnSn SjSj Right DownRight UpRight

10 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Prospective Learning Stage 2 –Close 1 door –Robot learns the cause of the failure –Robot back tracks and finds an earlier indicator of this cause

11 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Prospective Learning Stage 2 –Close 1 door –Robot learns the cause of the failure –Robot back tracks and finds an earlier indicator of this cause –Create a sub-task –Learn a new policy to sub- task

12 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Prospective Learning Stage 2 –Close 1 door –Robot learns the cause of the failure –Robot back tracks and finds an earlier indicator of this cause –Create a sub-task –Learn a new policy to sub- task –Resume original policy

13 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Prospective Learning Results Learned solutions < 2000 episodes

14 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Humanoid Robot Manipulation Domain Benefits of Prospective Learning –Adapt to new contexts by maintaining majority of the existing policy –Automatically generates sub-goals –Sub-task can be learned in a completely different state space. –Supports interactive learning

15 L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Conclusion A developmental view to robot learning A framework enables interactive incremental learning in stages Extension to the control basis learning framework using the idea of prospective learning


Download ppt "L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Learning Prospective Robot Behavior Shichao."

Similar presentations


Ads by Google