Learning From Demonstration Atkeson and Schaal Dang, RLAB Feb 28 th, 2007.

Learning From Demonstration Atkeson and Schaal Dang, RLAB Feb 28 th, 2007

Dang, RLAB 2 Goal Robot Learning from Demonstration –Small number of human demonstrations –Task level learning (learn intent, not just mimicry) Explore –Parametric vs. nonparametric learning –role of a priori knowledge

Feb 28th, 2007Dang, RLAB 3 Known Task Pendulum swing-up task –Like pole balancing, but more complex –Difficult, but easy to evaluate success Simplified –Restricted to horz. motion –Impt. variables picked out Pendulum angle Pendulum angular velocity Hand location Hand velocity Hand acceleration

Feb 28th, 2007Dang, RLAB 4 Implementation details SARCOS 7DOF arm Stereo Vision, colored ball indicators 0.12s delay overcome with Kalman filter –Idealized pendulum dynamics Redundant inverse kinematics and real-time inverse dynamics for control

Feb 28th, 2007Dang, RLAB 5 Learning Task composed of two subtasks Believe that subtask learning accelerates new task learning –1 Pole Swing up open-loop –2 Upright Balance Feedback Focus here on swing-up –Balancing already learned

Feb 28th, 2007Dang, RLAB 6 First approach Directly mimic human hand movement –Fails Differences in human and robot capabilities Improper demonstration (not horizontal) Imprecise mimicry

Feb 28th, 2007Dang, RLAB 7 Approach the second Learn reward – Learn a model – Use human demonstration as seed so a planner can find a good policy

Feb 28th, 2007Dang, RLAB 8 Learn Task Model Parametric: – – learn parameters via linear regression Nonparametric – –Use Locally Weighted Learning –Given desired variable and a set of possibly relevant input variables Cross validation to tune meta-parameters

Feb 28th, 2007Dang, RLAB 9 Swing up Transition to balance occurs at ± 0.5 radians with angular vel. < 3 rad/sec Reward function set to make robot want to be like demonstrator –

Feb 28th, 2007Dang, RLAB 10 Parametric Parameters learned from failure data Trajectory optimized using human trajectory as seed SUCCESS

Feb 28th, 2007Dang, RLAB 11 Nonparametric Slower, but still successful

Feb 28th, 2007Dang, RLAB 12 Harder Task Double pump swing up –Approach fails Believed to be due to improper modeling of the system Solved by

Feb 28th, 2007Dang, RLAB 13 Direct task-level learning Learn a correction term to add to the target angle –Now target ± (0.5+∆)rad –Use binary search Worked for parametric Didn’t for nonparametric –Left region of validity of local models –So, tweak velocity all over Binary search for coefficient

Feb 28th, 2007Dang, RLAB 14 Results

Feb 28th, 2007Dang, RLAB 15 Summary of Technique Watch demo, mimic hand Learn model, optimize demo trajectory Tune model, reoptimize Binary search for delta Binary search for c Succeeds for None Parametric, single Nonparametric, single Parametric, double Nonparametric, double Math

Feb 28th, 2007Dang, RLAB 16 Discussion points Reward function was given or learned? Does task-level direct learning make sense? –Only useful in this task / implementation? –I in PID? Nonparametrics don’t avoid all modeling errors –Poor planner? –Not enough data? A priori knowledge –human selects inputs, outputs, control system, perception, model selection, reward function, task segmenting, task factors It Works!

Learning From Demonstration Atkeson and Schaal Dang, RLAB Feb 28 th, 2007.

Similar presentations

Presentation on theme: "Learning From Demonstration Atkeson and Schaal Dang, RLAB Feb 28 th, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning From Demonstration Atkeson and Schaal Dang, RLAB Feb 28 th, 2007.

Similar presentations

Presentation on theme: "Learning From Demonstration Atkeson and Schaal Dang, RLAB Feb 28 th, 2007."— Presentation transcript:

Similar presentations

About project

Feedback