Download presentation

Presentation is loading. Please wait.

Published byBrenden Garwood Modified about 1 year ago

1
Biological Arm Motion through Reinforcement Learning by Jun Izawa, Toshiyuki Kondo, Koji Ito Presented by Helmut Hauser

2
helmut igi 2 Overview biological motivation and basic idea biological muscle force model mathematical formulations reaching task results and conclusions

3
helmut igi 3 Biological Motivation (1) Reinforcement Learning in biology (Dopamine,…) In the framework we have a big state and action space (Curse of dimensionality) (2) Multiple muscles produce joint torques High redundancy enables the system to maintain robustness and flexibility increases space Humans can deal with that, but how ??

4
helmut igi 4 Basic Idea How do humans learn a new motion ? We coactivate muscles and stiff our joint Stiffness decreases while learning (feeling „safer“) Our motions get smoother Maybe there exists some preferred domain in the action space with higher priority in the learning process. Idea: Restricting the learning domain for the action space while learning and then soften restrictions when improving.

5
helmut igi 5 Muscle force model Muscle forceelasticityviscosity l r equilibrium length „stiffness“

6
helmut igi 6 Biological Model Lower arm upper arm θ1θ1 θ2θ2

7
helmut igi 7 Merging two worlds Muscle force modelDynamic 2-link model R =G T KG… elasticityD=G T BG … viscosity and some transformations λR -1 G T K ……Θ v

8
helmut igi 8 Mathematical Formulation Remember: G is constant K = diag (k 0 +k i u i ) R = G T KG Θ v = λR -1 G T K D = G T BG constant

9
helmut igi 9 Mathematical Formulation Orthogonal decomposition: pseudoinverse: u = u 1 ‘ + u 2 ‘ n = n 1 ‘ + n 2 ‘ ň = n 1 ‘ + c* n 2 ‘. Note: 0 ≤ c ≤1

10
helmut igi 10 N(J) R(J) action space u ρ θvθv

11
helmut igi 11 N(J) R(J) action space u ρ θvθv c

12
helmut igi 12 Architecture Critic network Actor network Noise generator motor command u t q t-1 reward TD error

13
helmut igi 13 Reaching Task goal (GA) start S Reward model: 1 - c E r E for r -c E r E for -1for with r E =Σu i 2 over all 6 muscles

14
helmut igi 14 Some implementation facts - extended input q, since reward model needs u too ! - stiffness R set to rather „high“ values -Neural Network (proposed by Shibata) as a function approximator (backpropagation) - as a second experiment and a load with arbitrary orientation (which stays the same in one trial) is applied within a certain region -Parameter (like noise-parameter, c E of the reward model,…) have to be tuned.

15
helmut igi 15 Results Proposed architecture (compared to a standard approach) gets more reward Cummulative reward doesn‘t tend to zero Energy doesn‘t change in the early stage, decreases after hitting the target. With extra force: peak of stiffness moves to this area

16
helmut igi 16 Conlusions Can deal with redundant systems (typical case in nature) The search noise is restricted to a subspace A robust controller has been achieved Some extra tuning was needed (made by evolution ?) Future outlook: Applying to hierarchical system (more stages) How to prevent extra tuning ?

17
helmut igi 17 Literature „Biological Robot Arm Motion through Reinforcement Learning“ Jun Izawa, Toshiyuki Kondo, Koji Ito Proceedings of the 2002 IEEE International Conference on Robotics & Automation „Motor Learning Model using Reinforcement Learning with Neural Internal Model“ Jun Izawa, Toshiyuki Kondo, Koji Ito Department of Computational Intelligence and Systems „Biological Robot Arm Motion through Reinforcement Learning“ Jun Izawa, Toshiyuki Kondo, Koji Ito Biol.Xabern. 91, (2004) Springer-Verlag 2004

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google