Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan.

Similar presentations


Presentation on theme: "Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan."— Presentation transcript:

1 Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan

2 What does Dopamine encode? Important neuromodulator -Neurological/psychiatric disorders -Drug addiction/self stimulation Fundamental role in RL -Classical/Pavlovian conditioning -Instrumental/operant conditioning DA neurons respond to: −Unexpected (appetitive) rewards −Stimuli predicting (appetitive) rewards −Withdrawal of expected rewards −Novel/Salient stimuli

3 What does Dopamine encode?  DA represents some aspect of reward, but not rewards as such.

4 The TD Hypothesis of Dopamine DA encodes the reward prediction error <-DA Stimulus Reward DA δ(t) Precise theory for the generation of DA firing patterns Compelling account for the role of DA in classical conditioning

5 But: Fiorillo, Tobler & Schultz 2003 Introduce inherent uncertainty into the classical conditioning paradigm Five visual stimuli indicating different reward probabilities: P=0, ¼, ½, ¾,1 CS = 2 sec visual stimulus US (probabilistic) = drops of juice

6 Fiorillo, Tobler & Schultz 2003 At stimulus time: DA represents mean expected reward Interesting: A ramp in activity up to reward (highest for p=½) Hypothesis: DA ramp encodes uncertainty in reward

7 Dopamine: Uncertainty or TD error? No apparent reason for ramp The ramp is predictable from the stimulus TD predicts away predictable quantities contradiction ! Side issue: the ramp is like a constantly surprising reward -- it can’t influence action choice

8 At time of reward: Prediction errors result from uncertainty Crucially: Positive and negative errors cancel out A closer look at FTS’s results: p = 0.5 p = 0.75

9 TD error δ(t) can be positive or negative Neuronal firing rate is only positive (negative values are coded relative to base firing rate) But: DA base firing rate is low -> asymmetric encoding of δ(t) A closer look at FTS’s results: 55% 270% δ(t) DA

10 x(1) x(2) … r(t) δ(t) V(1) V(20) Tapped delay line Standard online TD learning Fixed learning rate Negative δ(t) scaled by d=1/6 prior to PSTH Modeling TD with asymmetric errors Learning proceeds normally (without scaling) −Necessary to produce the right predictions −Can be biologically plausible

11 TD learning with asymmetric prediction errors replicates the recorded data accurately.  Ramps result from asymmetrically coded prediction errors propagating back to stimulus Artifact of summing PSTHs over nonstationary recent reward histories Modeling TD with asymmetric errors

12 Analytically deriving the maximum error at the time of the reward we get: => the ramp is indeed highest for P=½ But: DA Encodes nothing but temporal difference error! Experimental test: Ramp as within or between trial phenomenon? DA: Uncertainty or Temporal Difference?

13 Trace conditioning: A puzzle and its resolution Same (if not more) uncertainty, but… no DA ramping! (Fiorillo et al.; Morris, Arkadir, Nevet, Vaadia & Bergman) Resolution: lower learning rate in trace conditioning eliminates ramp CS = short visual stimulus Trace period US (probabilistic) = drops of juice

14 Modeling TD with asymmetric errors: Small response for reward at Pr=1 and for stimulus at Pr=0 -> Result from misidentification of stimuli (Morris Arkadir, Nevet, Vaadia & Bergman)

15 σ = 0.0577 σ = 0.0866 σ = 0.1155 prediction errorweights Rate coding is inherently stochastic Add noise to tapped delay line representation => TD learning is robust to this type of noise Mirenowicz and Schultz (1996) Other sources of uncertainty: Representational Noise (1)

16 Neural timing of events is necessarily inaccurate Add temporal noise to tapped delay line representation => Devastating effects of even small amounts of temporal noise on TD’s predictions! Other sources of uncertainty: Representational Noise (2) ε = 0.05 ε = 0.10

17 Conclusions Preserve the TD hypothesis of Dopamine: −No explicit coding of uncertainty −Ramping explained by neural constraints −Explains the disappearance of the ramp in trace conditioning Important challenges to the TD hypothesis −Conditioned inhibition −Effects of timing


Download ppt "Dopamine, Uncertainty and TD Learning CoSyNe’04 Yael Niv Michael Duff Peter Dayan."

Similar presentations


Ads by Google