Presentation on theme: "Biological Arm Motion through Reinforcement Learning by Jun Izawa, Toshiyuki Kondo, Koji Ito Presented by Helmut Hauser."— Presentation transcript:
Biological Arm Motion through Reinforcement Learning by Jun Izawa, Toshiyuki Kondo, Koji Ito Presented by Helmut Hauser
helmut igi 2 Overview biological motivation and basic idea biological muscle force model mathematical formulations reaching task results and conclusions
helmut igi 3 Biological Motivation (1) Reinforcement Learning in biology (Dopamine,…) In the framework we have a big state and action space (Curse of dimensionality) (2) Multiple muscles produce joint torques High redundancy enables the system to maintain robustness and flexibility increases space Humans can deal with that, but how ??
helmut igi 4 Basic Idea How do humans learn a new motion ? We coactivate muscles and stiff our joint Stiffness decreases while learning (feeling „safer“) Our motions get smoother Maybe there exists some preferred domain in the action space with higher priority in the learning process. Idea: Restricting the learning domain for the action space while learning and then soften restrictions when improving.
helmut igi 5 Muscle force model Muscle forceelasticityviscosity l r equilibrium length „stiffness“
helmut igi 6 Biological Model Lower arm upper arm θ1θ1 θ2θ2
helmut igi 7 Merging two worlds Muscle force modelDynamic 2-link model R =G T KG… elasticityD=G T BG … viscosity and some transformations λR -1 G T K ……Θ v
helmut igi 8 Mathematical Formulation Remember: G is constant K = diag (k 0 +k i u i ) R = G T KG Θ v = λR -1 G T K D = G T BG constant
helmut igi 9 Mathematical Formulation Orthogonal decomposition: pseudoinverse: u = u 1 ‘ + u 2 ‘ n = n 1 ‘ + n 2 ‘ ň = n 1 ‘ + c* n 2 ‘. Note: 0 ≤ c ≤1
helmut igi 10 N(J) R(J) action space u ρ θvθv
helmut igi 11 N(J) R(J) action space u ρ θvθv c
helmut igi 12 Architecture Critic network Actor network Noise generator motor command u t q t-1 reward TD error
helmut igi 13 Reaching Task goal (GA) start S Reward model: 1 - c E r E for r -c E r E for -1for with r E =Σu i 2 over all 6 muscles
helmut igi 14 Some implementation facts - extended input q, since reward model needs u too ! - stiffness R set to rather „high“ values -Neural Network (proposed by Shibata) as a function approximator (backpropagation) - as a second experiment and a load with arbitrary orientation (which stays the same in one trial) is applied within a certain region -Parameter (like noise-parameter, c E of the reward model,…) have to be tuned.
helmut igi 15 Results Proposed architecture (compared to a standard approach) gets more reward Cummulative reward doesn‘t tend to zero Energy doesn‘t change in the early stage, decreases after hitting the target. With extra force: peak of stiffness moves to this area
helmut igi 16 Conlusions Can deal with redundant systems (typical case in nature) The search noise is restricted to a subspace A robust controller has been achieved Some extra tuning was needed (made by evolution ?) Future outlook: Applying to hierarchical system (more stages) How to prevent extra tuning ?
helmut igi 17 Literature „Biological Robot Arm Motion through Reinforcement Learning“ Jun Izawa, Toshiyuki Kondo, Koji Ito Proceedings of the 2002 IEEE International Conference on Robotics & Automation „Motor Learning Model using Reinforcement Learning with Neural Internal Model“ Jun Izawa, Toshiyuki Kondo, Koji Ito Department of Computational Intelligence and Systems „Biological Robot Arm Motion through Reinforcement Learning“ Jun Izawa, Toshiyuki Kondo, Koji Ito Biol.Xabern. 91, (2004) Springer-Verlag 2004