Goal Directed Reaching with the Motor Cortex Model Cheol Han Feb 20, 2007.

Goal Directed Reaching with the Motor Cortex Model Cheol Han Feb 20, 2007

Introduction Goal: A computational model for goal directed reaching movement with biologically plausible motor cortex model, which can explain 1. neural coding in the motor cortex 2. relationship between skill learning and map formation 3. reorganization of the motor cortex after lesion with improvement of movement

Overview Dual Map Motor output map Motor input map Models Arm model with Hill-type muscles Cortex model Reinforcement learning framework Results Discussion

Directional coding Georgeopoulos, 1986

Dual Map Two views of neural coding in the motor cortex Low-level, Muscle coding (Evarts…) High-level, Kinematic coding (Georgeopoulos, 1986) Both or Intermediate (joint) coding We hypothesized Motor cortex output map: mainly encodes low-level muscle coding Motor input map: high-level kinematic coding

Dual Map (1) Two debating view of neural coding in the motor cortex Low-level, Muscle coding Donoghue group’s ICMS experiments (not exactly..) Todorov 2003; (Scott 1997, Scott and Kalaska 1997, Scott et al., 2001) Todorov(2003) mentioned that directional coding changes over postures (evidence against directional coding). High-level, Kinematic coding (Georgeopoulos, 1986) However, why just one? How about both? Intermediate-level, joint (Kakei et al., 1999, 2001) Transformation from kinematic coding to muscle coding in the motor cortex

Dual Map (2) Different Experiments may show different characteristics ICMS (classic, short and weak) ->Muscle? Artificial stimulation on the motor cortex A response of the body is marked on the point. ICMS (longer and stronger)->Direction or Muscle? Graziano et al., 2002; Movements of specific posture: ‘moving hands to mouth’ Motor primitive or equilibrium point hypothesis Voluntary movement -> Direction Activation on the motor cortex. It is induced by the higher level’s.

Learning goal directed movements with Actor-Critic Learning feed-forward controller using temporal difference learning and actor-critic architecture (Sutton, 1984, Barto et al. 1983) Biologically plausible (dopamine and/or acetylcholine modulation of LTP in motor cortex) Continuous time and space (K Doya, 2000) Similar approaches Bissmarck et al, 2005. Jun Izawa et al, 2004. Trajectory Planner Kinematic Coding Motor Cortex Model Motoneurons (Spinal Cord) Arm Model With muscles Critic (Basal Ganglia) Temporal Difference Learning (Dopamine neurons) Competitive Hebbian Learning

Motor output map ICMS may exhibit characteristics of corticospinal projections Monosynaptic projections from some M1 neurons to motoneurons Fetz and Cheney, 1980; Lemon et al., 1986 Todorov(2003), Donoghue group’s Motor Cortex Model Motoneurons (Spinal Cord)

Motor input map Motor cortex neural recording during voluntary movements (i.e. Georgeopoulos) Activation level in the voluntary movement tends to be similar to the high level’s coding, kinematic coding Kinematic Coding Motor Cortex Model

Models Motor output map Competitive Hebbian learning with a motor cortex model Reversed feature extraction Motor input map Temporal difference Reinforcement learning

Arm model Arm model: 2 links on the horizontal plane 6 muscles with Hill-type muscle model Shoulder Extensor (E), Shoulder Flexor (F) Elbow Extensor (O), Elbow Flexor (C) Biarticular Extensor (B) and Flexor (T) An accurate arm model is important Todorov (2002) mentioned characteristic may be propagated from bottom to up. Ning Lan(2002), Zajac (1989), Katayama (1993), Cheng et al., (2000), Spoelstra et al.(2000) (from Spoelstra et al., 2000 1 )

Motor Cortex model Chernjavsky and Moody, 1990 2 layer with GABA neurons. Shunting inhibitory GABA neurons Mexican Hat activation Shunting inhibition (Douglas et al., 1995; Prescott et al., 2003) PYR GABA

Cortex model SOFM and the Mexican hat activation Reversed feature extraction Feature Activity dependent weight decay (Reverse of BCM rule) Local competition with the Mexican hat activation. Local repulsiveness may induce map formation moving Kinetic coding (including velocity and force) not kinematic coding (excluding velocity and force) BECAUSE motoneuron activation relates the muscle force generation. Feature Input Sensory Cortex (SOM) Output Random Stimulation Motor Cortex (SOM) Feature Weights which construct SOM

Model Diagram Our motor cortex model includes the inverse dynamics and the inverse muscle model. How do we learn it in a biologically plausible manner? Using reinforcement learning Provides an evaluation of the movement Implementation with temporal difference learning based on the actor-critic structure Similar approaches Bissmarck et al, 2005. Jun Izawa et al, 2004. Trajectory Generator Inverse Dynamics Inverse Muscle Model Motoneurons Arm Evaluator Of Mvmt Joint static Level Planning Joint “force” Level Planning Muscle Level Planning ACTORCRITIC TD error

Actor-Critic Model (Sutton, 1984) “Actor” produces a motor command The motor command feeds into the plant. “Critic” evaluate how good the movement was, compared with previous expectations (TD error) Update “Actor” based on Critic’s evaluation. Update “Critic”. If the actor is improved, the critic can expect better movements. The worse movement than what the critic expected is discarded. Trajectory Generator MOTOR CORTEX Arm Evaluator Of Mvmt ACTORCRITIC TD error

Actor: compute the motor commands Example of Actor: Bissmarck et al, 2005. Coding of kinematic variables Distributed coding Action pool: preferred torques The layer contains action unit which is tuned to “preferred torque” Competition between these preferred torques using softmax. Pi is the probability to be chosen, shown as bar in the diagram. Modifiable weights w exist between kinematic planning signal and preferred torques Exploration using action perturbation Kinematic planning Torque (Joint Force) pi w TD Preferred Torque Layer

Critic: providing the reward prediction error for actor learning Temporal Difference Learning Critic learns the reward prediction error by the temporal difference learning The reward is generally delayed This prediction of reward helps to help generate correct action choice before the reward is received (temporal credit assignment problem). K Doya, 2000. in continuous time and space Critic: The Basal Ganglia and dopamine neurons Dopamine neuron carry TD error (Schultz 1998) Reward prediction error is learned in the basal ganglia (O’Doherty Science 2004)

Models: Reinforcement Learning -Biological Evidence Schultz W, 1998 Dopamine input modulates the LTP This is for the striatum, but the motor cortex also has a similar architecture Reward prediction error is learned in the basal ganglia (O’Doherty Science 2004) LTP Post-synaptic input Pre-synatic input Dopamine synapse

Critic: immediate reward A large reward is given at the goal. The reward function over space does not have to be continuous. However, if it is continuous, it helps to find a good movement. The reward function bellow is (Bissmarck et al., 2005)

Critic: Reward prediction error The total predicted reward at the current state includes discounted future rewards A critic learns this predicted error at the current state Delta shows how much action made difference between expected reward and real reward. If it’s positive, the action was good. t0t

Critic: Reward prediction error Example: Dopamine neuron CSUS Reward given No reward A well trained critic produced Just before reward is expected to be given If there is no reward, because a well trained critic expected, delta become negative.

Results (1): Motor output map Motor output map of the cortex model Map representation is the muscle coding

Results (2) : Motor output map 50 msec random stimulation on the motor cortex Motoneuron pattern shows ‘determined’ preferred direction. Actually, motoneuron is tuned to preferred “torque”. However, at a fixed starting posture, preferred torque implies preferred direction

Results (3) : Motor input map NOT FINISHED, NEED TUNING OF REINFORCEMENT LEARNING Movement is not fully learned Motor input map Activation of the motor cortex during a voluntary movement. Broad activation (on 20% of movement time) Similar direction has similar pattern

Results (4) : Motor input map Population code During the first 20% of time Excluded insignificantly tuned neurons (about half among 400 neurons)

Short Discussion Neural coding and regression Tuning curve over directions Cosine Sharper than cosine Truncated cosine Advantage of population coding Two ways of neural coding

Neural coding and regression Cricket detects wind direction with four neurons. ci is pre-tuned (preferred) wind direction of the ith neuron, and ri is its firing rate. Regression error is the smallest where the preferred direction exists. (Its tuning curve is a truncated cosine function) INFERENCE AND COMPUTATION WITH POPULATION CODES (Pouget A, Dayan P, Zemel RS. 2003)

Tuning Curve If the tuning curve is cosine function as Georgeopoulos (1986) Perfect reconstruction using basis If the tuning curve is sharper than cosine function Recently, sharper tuning curve has been reported (Paninski et al., 2004; Scott et al., 2001) Distortion exists. (Regression error)

Advantage of population coding Low regression error: Ideally, if preferred direction exist for all different directions. (Pouget et al., 2003) Strong to noisy input Pouget et al., 2003 Less variability in the motor control Assumes signal dependent noise (SDN) Use more muscles, less variability (Todorov, 2002)

Two ways of neural coding (1) 1. Set of Radial Basis functions No modifiable connection with the inputs Inputs are fed to all the neurons ‘equally’ then neurons are selectively responded. Each RBF has its own motoneuron pattern—preferred direction ??? BROADCAST Inputs Radial Basis Functions Output BROADCAST Non-Modifiable Modifiable

Two ways of neural coding (2) 2. Feed-forward neural network Modifiable connection with the inputs Weighted sum of inputs are fed to the neurons. Each neuron already has the preferred direction— neural coding. If a neuron’s preferred direction is aligned to the desired direction, connection should be fortified. SOFM is the way to modify the connection from the inputs PROJECTION

Two ways of neural coding (3) Which one? We chose the second one. Because.. A) RBF requires supervised learning or reinforcement learning on output layer. But that’s unrealistic on corticospinal circuit. B) RBF does not have plasticity for input layer.

Future work Fine tuning of Reinforcement learning Cerebellum Concurrent Learning of Motor input map and Motor output map Sensory cortex, which may be related to the feedback control “Premotor cortex” for inverse kinematic coding (Action sensory coding, currently implemented with SOM)

Goal Directed Reaching with the Motor Cortex Model Cheol Han Feb 20, 2007.

Similar presentations

Presentation on theme: "Goal Directed Reaching with the Motor Cortex Model Cheol Han Feb 20, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Goal Directed Reaching with the Motor Cortex Model Cheol Han Feb 20, 2007.

Similar presentations

Presentation on theme: "Goal Directed Reaching with the Motor Cortex Model Cheol Han Feb 20, 2007."— Presentation transcript:

Similar presentations

About project

Feedback