Introduction: What does phasic Dopamine encode ? With asymmetric coding of errors, the mean TD error at the time of reward is proportional to p(1-p) ->

Introduction: What does phasic Dopamine encode ? With asymmetric coding of errors, the mean TD error at the time of reward is proportional to p(1-p) -> Indeed maximal at p=50% Classical conditioning paradigm (delay conditioning) using probabilistic outcomes -> generates ongoing prediction errors in a learned task Single DA cell recordings in VTA/SNc: At stimulus time - DA represents mean expected reward (compliant with TD hypothesis) Surprising ramping of activity in the delay -> Fiorillo et al.’s hypothesis: Coding of uncertainty However: No prediction error to `justify’ ramp TD learning predicts away any predictable quantity Uncertainty not available for control -> The uncertainty hypothesis seems contradictory to the TD hypothesis DA single cell recordings from the lab of Wolfram Schultz Overview Substantial evidence suggests that phasic dopaminergic firing represents a temporal difference (TD) error in the predictions of future reward. Recent experiments probe the way information about outcomes propagates back to the stimuli predicting them. These use stochastic rewards (eg., Fiorillo et al., 2003) which allow systematic study of persistent prediction errors even in well learned tasks. We use a novel theoretical analysis to show that across-trials ramping in DA activity may be a signature of this process. Importantly, we address the asymmetric coding in DA activity of positive and negative TD errors, and acknowledge the constant learning that results from ongoing prediction errors. Selected References [1] Fiorillo, Tobler & Schultz (2003) - Discrete coding of reward probability and uncertainty by dopamine neurons. Science, 299, 1898–1902. [2]Morris, Arkadir, Nevet, Vaadia & Bergman (2004) – Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron, 43, 133-143. [3]Montague, Dayan & Sejnowski (1996) – J Neurosci, 16:1936-1947. [4]Sutton and Barto (1988) – Reinforcement learning: An introduction, MIT Press.Acknowledgements This research was funded by an EC Thematic Network short-term fellowship to YN and The Gatsby Charitable Foundation. Asymmetric Coding of Temporal Difference Errors: Implications for Dopamine Firing Patterns Y. Niv 1,2, M.O. Duff 2 and P. Dayan 2 (1)Interdisciplinary Center for Neural Computation, Hebrew University, Jerusalem, yaelniv@alice.nc.huji.ac.il (2) Gatsby Computational Neuroscience Unit, University College London Simulating TD with asymmetric coding Unpredicted reward (neutral/no stimulus) Predicted reward (learned task) -> DA encodes a temporally sophisticated reward signal Computational hypothesis – DA encodes reward prediction error: (Sutton, Barto 1987, Montague, Dayan, Sejnowski, 1996) Temporal Difference error -> Phasic DA encodes reward prediction error Precise computational theory for generation of DA firing patterns Compelling account for role of DA in appetitive conditioning Experimental results: measuring propagating errors Fiorillo et al. (2003) 2 sec visual stimulus indicating reward probability – 100%, 75%, 50%, 25% or 0% Probabilistic reward (drops of juice) A TD resolution: Ramps result from backpropagating prediction errors - Note that according to TD, activity at time of reward should cancel out – but it doesn’t. This is because… Prediction errors can be positive or negative However, firing rate is positive -> encoding of negative errors is relative to baseline activity But: baseline activity in DA cells is low (2-5Hz) -> asymmetric coding of errors Experiment Model Omitted reward (probe trial) Negative δ(t) scaled by d=1/6 prior to PSTH summation Learning proceeds normally (without scaling): Necessary to produce the right predictions Can be biologically plausible Conclusion: Uncertainty or Temporal Difference? Trace conditioning: A puzzle solved Short visual stimulus Trace period Reward (probabilistic) = drops of juice Same (if not more) uncertainty But: no DA ramping Morris et al. (2004) (see also Fiorillo et al. (2003)) Solution: Lower learning rate in trace conditioning eliminates ramp Indeed: computed learning rate in Morris et al.’s data near zero (personal communication) However: No need to assume explicit coding of uncertainty – Ramping in DA activity is explained by neural constraints. Explanation for puzzling absence of ramp in trace conditioning results. Experimental tests:  Ramp as within or between trial phenomenon?  Relationship between ramp size and learning rate (within/between experiments)? Challenges to TD remain: TD and noise; Conditioned inhibition; additivity… Visualizing Temporal-Difference Learning: After first trial After third trialTask learnedLearning continues (~10 trials) x(1) x(2) … r( t ) δ(t)δ(t) V(1) V(30) 55% 270% δ(t) DA Bayer and Glimcher Schultz lab -> Ongoing (intertwined) backpropagation of asymmetrically coded positive and negative errors causes ramps to appear in the summed PSTH -> The ramp itself is a between trial and not a within trial phenomenon (results from summation over different reward histories) x50% x75% x25% p = 50% p = 75%

Introduction: What does phasic Dopamine encode ? With asymmetric coding of errors, the mean TD error at the time of reward is proportional to p(1-p) ->

Similar presentations

Presentation on theme: "Introduction: What does phasic Dopamine encode ? With asymmetric coding of errors, the mean TD error at the time of reward is proportional to p(1-p) ->"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction: What does phasic Dopamine encode ? With asymmetric coding of errors, the mean TD error at the time of reward is proportional to p(1-p) ->

Similar presentations

Presentation on theme: "Introduction: What does phasic Dopamine encode ? With asymmetric coding of errors, the mean TD error at the time of reward is proportional to p(1-p) ->"— Presentation transcript:

Similar presentations

About project

Feedback