Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning From Demonstration. Robot Learning A good control policy u=  (x,t) is often hard to engineer from first principles Reinforcement learning 

Similar presentations


Presentation on theme: "Learning From Demonstration. Robot Learning A good control policy u=  (x,t) is often hard to engineer from first principles Reinforcement learning "— Presentation transcript:

1 Learning From Demonstration

2 Robot Learning A good control policy u=  (x,t) is often hard to engineer from first principles Reinforcement learning  Learn from trial and error Direct teaching  Have a human guide the robot’s motion Imitation learning  Observe and mimic human demonstrations

3 Demos

4 Learning “Flavors” Given demonstrations, learn dynamics model  System identification problem Given objective function, optimize policy  Standard optimal control problem  Can be solved using reinforcement learning (simulated demonstrations) Given policy demonstrations, find objective function  Inverse optimal control / inverse reinforcement learning

5 Learning “Flavors” Demonstrations Performance Objective Plan or Control Policy Inverse optimal control Direct policy learning Optimal Control Dynamics model System ID

6 Direct Policy Learning Wish to learn u=  (x) Human performances: {(x,u) i for i=1,…n}  System traces Learn the mapping   Nearest neighbors  Regression  Neural networks  Locally weighted regression  Etc…

7 Nearest Neighbors Observe {(x,u) i for i=1,…n}  (x) = u i* for i* = argmin i ||x-x i || 2 Extension: K-nearest neighbors Query point

8 Linear Regression Hypothesize   =   k  k (x)   k (x) are basis functions Observe {(x,u) i for i=1,…n} Min   i ||u i –   (x i )|| 2  Least squares problem

9 Model-based Nonlinear Regression Hypothesize a model class   (x)  E.g.,  are feedback gain parameters Observe {(x,u) i for i=1,…n} Min   i ||u i –   (x i )|| 2  Nonlinear least squares problem

10 Inverse Optimal Control Parsimony hypothesis: goals are better than policies at describing appropriate behavior in an open world Two stages  Learn the objective from demonstrations  Plan using the objective and sensory input on-line Difficulty: highly underconstrained learning problem

11 Example

12 Reinforcement Learning Have immediate reward/cost function R(x,u) Find policy that maximizes expected global return Use trial and error to improve return over time  TD methods  Q-learning

13 Trajectory Following Problem 1: Learn a reference trajectory from human demonstrations Problem 2: Learn to follow a reference trajectory with dynamics, disturbances

14 Characterizing Performance Performance Metrics  Optimality: does the learned policy perform optimally (e.g., track the reference well)  Generality: does the learned policy perform well in new scenarios? (under disturbances)

15 Discussion Learning is useful for exotic devices, deforming environments, dynamic tasks, social robots Theory and benchmarking not developed as well as classic machine learning  Temporal component  Difficulty of gathering training/testing datasets  Nonuniform hardware testbeds

16 Reminder: IU Robotics Open House April 16, 4-7pm R-House: 919 E 13 th st


Download ppt "Learning From Demonstration. Robot Learning A good control policy u=  (x,t) is often hard to engineer from first principles Reinforcement learning "

Similar presentations


Ads by Google