Models: reinforcement learning & fMRI Nathaniel Daw 11/28/2007.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Bayes rule, priors and maximum a posteriori
Lecture 18: Temporal-Difference Learning
Linear Regression.
dopamine and prediction error
The General Linear Model Or, What the Hell’s Going on During Estimation?
Integration of sensory modalities
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Planning under Uncertainty
Visual Recognition Tutorial
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 2: Evaluative Feedback pEvaluating actions vs. instructing by giving correct.
Statistical Decision Theory, Bayes Classifier
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Presenting: Assaf Tzabari
Group analyses of fMRI data Methods & models for fMRI data analysis 26 November 2008 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.
Machine Learning CMPT 726 Simon Fraser University
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Visual Recognition Tutorial
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Crash Course on Machine Learning
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
PATTERN RECOGNITION AND MACHINE LEARNING
Decision theory and Bayesian statistics. Tests and problem solving Petter Mostad
Particle Filtering in Network Tomography
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
Statistical Decision Theory
7/16/2014Wednesday Yingying Wang
SPM Course Zurich, February 2015 Group Analyses Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London With many thanks to.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Group analyses of fMRI data Methods & models for fMRI data analysis November 2012 With many thanks for slides & images to: FIL Methods group, particularly.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Wellcome Dept. of Imaging Neuroscience University College London
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
INTRODUCTION TO Machine Learning 3rd Edition
BCS547 Neural Decoding.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
Machine Learning 5. Parametric Methods.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Statistical Analysis An Introduction to MRI Physics and Analysis Michael Jay Schillaci, PhD Monday, April 7 th, 2007.
Univariate Gaussian Case (Cont.)
Review of statistical modeling and probability theory Alan Moses ML4bio.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Oliver Schulte Machine Learning 726
Group Analyses Guillaume Flandin SPM Course London, October 2016
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Maximum Likelihood Estimation
Computational models for imaging analyses
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Chapter 2: Evaluative Feedback
Parametric Methods Berlin Chen, 2005 References:
Chapter 2: Evaluative Feedback
Presentation transcript:

models: reinforcement learning & fMRI Nathaniel Daw 11/28/2007

overview reinforcement learning model fitting: behavior model fitting: fMRI

overview reinforcement learning –simple example –tracking –choice model fitting: behavior model fitting: fMRI

Reinforcement learning: the problem Optimal choice learned by repeated trial-and-error –eg between slot machines that pay off with different probabilities But… –Payoff amounts & probabilities may be unknown –May additionally be changing –Decisions may be sequentially structured (chess, mazes: this we wont consider today) Very hard computational problem; computational shortcuts essential Interplay between what you can and should do Both have behavioral & neural consequences

Simple example n-armed bandit, unknown but IID payoffs –surprisingly rich problem Vague strategy to maximize expected payoff: 1)Predict expected payoff for each option 2)Choose the best (?) 3)Learn from outcome to improve predictions

Simple example 1)Predict expected payoff for each option –Take V L = last reward received on option L –(more generally, some weighted average of past rewards) –This is an unbiased, albeit lousy, estimator 2)Choose the best –(more generally, choose stochastically s.t. the machine judged richer is more likely to be chosen) Say left machine pays 10 with prob 10%, 0 owise Say right machine pays 1 always What happens? (Niv et al. 2000; Bateson & Kacelnik)

Behavioral anomalies Apparent risk aversion arises due to learning, i.e. due to the way payoffs are estimated –Even though we are trying to optimize expected reward, risk neutral –Easy to construct other examples for risk proneness, “probability matching” Behavioral anomalies can have computational roots Sampling and choice interact in subtle ways

what can we do?

What can we do? Exponentially weighted running average of rewards on an option: Convenient form because it can be recursively maintained (‘exponential filter’) ‘error-driven learning’, ‘delta rule’, ‘Rescorla-Wagner’ Reward prediction trials into past weight

what should we do? [learning]

Bayesian view Specify ‘generative model’ for payoffs Assume payoff following choice of A is Gaussian with unknown mean  A ; known variance   PAYOFF Assume mean  A changes via a Gaussian random walk with zero mean and variance   WALK AA trials payoff for A

Bayesian view Describe prior beliefs about parameters as a probability distribution Assume they are Gaussian with mean ; variance Update beliefs in light of experience with Bayes’ rule mean of payoff for A P(  A | payoff) / P(payoff |  A )P(  A )

Bayesian belief updating mean of payoff for A

Bayesian belief updating mean of payoff for A

Bayesian belief updating mean of payoff for A

Bayesian belief updating mean of payoff for A

Bayesian belief updating mean of payoff for A

Notes on Kalman filter Looks like Rescorla/Wagner but We track uncertainty as well as mean Learning rate is function of uncertainty (asymptotically constant but nonzero) Why do we exponentially weight past rewards?

what should we do? [choice]

The n-armed bandit n slot machines binary payoffs, unknown fixed probabilities you get some limited (technically: random, exponentially distributed) number of spins want to maximize income surprisingly rich problem

The n-armed bandit 1.Track payoff probabilities Bayesian: learn a distribution over possible probs for each machine This is easy: Just requires counting wins and losses (Beta posterior)

The n-armed bandit 2.Choose This is hard. Why?

The explore-exploit dilemma 2.Choose Simply choosing apparently best machine might miss something better: must balance exploration and exploitation simple heuristics, eg choose at random once in a while

Explore / exploit Which should you choose? left bandit: 4/8 spins rewarded right bandit: 1/2 spins rewarded mean of both distributions: 50%

Explore / exploit left bandit: 4/8 spins rewarded right bandit: 1/2 spins rewarded green bandit more uncertain (distribution has larger variance) Which should you choose?

Explore / exploit Which should you choose? Trade off uncertainty, exp value, horizon ‘Value of information’: exploring improves future choices How to quantify? … it also has a larger chance of being better …which would be useful to find out, if true although green bandit has a larger chance of being worse…

Optimal solution This is really a sequential choice problem; can be solved with dynamic programming Naïve approach: Each machine has k ‘states’ (number of wins/losses so far); state of total game is product over all machines; curse of dimensionality (k n states) Clever approach: (Gittins 1972) Problem decouples to one with k states – consider continuing on a single bandit versus switching to a bandit that always pays some known amount. The amount for which you’d switch is the ‘Gittins index’. It properly balances mean, uncertainty & horizon

overview reinforcement learning model fitting: behavior –pooling multiple subjects –example model fitting: fMRI

Model estimation What is a model? –parameterized stochastic data-generation process Model m predicts data D given parameters  Estimate parameters: posterior distribution over  by Bayes’ rule Typically use a maximum likelihood point estimate instead ie the parameters for which data are most likely. Can still study uncertainty around peak: interactions, identifiability

application to RL eg D for a subject is ordered list of choices c t, rewards r t for eg where V might be learned by an exponential filter with decay 

Example behavioral task Reinforcement learning for reward & punishment: participants (31) repeatedly choose between boxes each box has (hidden, changing) chance of giving money (20p) also, independent chance of giving electric shock (8 on 1-10 pain scale) shock money

This is good for what? parameters may measure something of interest –eg learning rate, monetary value of shock allow to quantify & study neural representations of subjective quantities –expected value, prediction error compare models compare groups

Compare models In principle: ‘automatic Occam’s razor’ In practice: approximate integral as max likelihood + penalty: Laplace, BIC, AIC etc. Frequentist version: likelihood ratio test Or: holdout set; difficult in sequential case Good example refs: Ho & Camerer

Compare groups How to model data for a group of subjects? Want to account for (potential) inter-subject variability in parameters  –this is called treating the parameters as “random effects” –ie random variables instantiated once per subject –hierarchical model: each subject’s parameters drawn from population distribution her choices drawn from model given those parameters

Random effects model Hierarchical model: –What is  s ? e.g., a learning rate –What is P(  s |  )? eg a Gaussian, or a MOG –What is  eg the mean and variance, over the population, of the regression weights Interested in identifying population characteristics  (all multisubject fMRI analyses work this way)

Random effects model Interested in identifying population characteristics  –method 1: summary statistics of individual ML fits (cheap & cheerful: used in fMRI) –method 2: estimate integral over parameters eg with Monte Carlo What good is this? –can make statistical statements about parameters in population –can compare groups –can regularize individual parameter estimates ie, P(  |  s ) : “empirical Bayes”

Example behavioral task Reinforcement learning for reward & punishment: participants (31) repeatedly choose between boxes each box has (hidden, changing) chance of giving money (20p) also, independent chance of giving electric shock (8 on 1-10 pain scale) shock money

Behavioral analysis Fit trial-by-trial choices using “conditional logit” regression model  coefficients estimate effects on choice of past rewards, shocks, & choices (Lau & Glimcher; Corrado et al)  selective effect of acute tryptophan depletion? [ … reward shock choice … ] [weights] … value(box 1) = value(box 2) = etc … ] [weights] …[ … values  choice probabilities using logistic (‘softmax’) rule probabilities  choices stochastically estimate weights by maximizing joint likelihood of choices, conditional on rewards prob(box 1) exp(value(box 1))

Summary statistics of individual ML fits –fairly noisy (unconstrained model, unregularized fits)

models predict exponential decays in reward & shock weights  & typically neglect choice-choice autocorrelation

Fit of TD model (w/ exponentially decaying choice sensitivity), visualized same way (5x fewer parameters, essentially as good fit to data; estimates better regularized)

£0.20 -£0.12 £0.04 Quantify value of pain

Effect of acute tryptophan depletion?

Depleted participants are: equally shock-driven more ‘sticky’ (driven to repeat choices) less money-driven (this effect less reliable)

p >.5 linear effects of blood tryptophan levels:

p <.005 linear effects of blood tryptophan levels:

p <.01p <.005 linear effects of blood tryptophan levels:

overview reinforcement learning model fitting: behavior model fitting: fMRI –random effects –RL regressors

L rFP rFP L FP p<0.01 p<0.001 What does this mean when there are multiple subjects? regression coefficients as random effects if we drew more subjects from this population is the expected effect size > 0?

History – SPM paper, software released, used for PET low ratio of samples to subjects (within-subject variance not important) – Development of fMRI more samples per subject 1998 – Holmes & Friston introduce distinction between fixed and random effects analysis in conference presentation; reveal SPM had been fixed effects all along 1999 – Series of papers semi-defending fixed effects; but software fixed

RL & fMRI Common approach: fit models to behavior, use models to generate regressors for fMRI GLM –eg predicted value; error in predicted value –where in brain does BOLD signal correlate with computationally generated signal (convolved with HRF)? –quantify & study neural representation of subjective factors reward prediction error (O’Doherty et al 2003 and lots of other papers) Schoenberg et al 2007

Examples: Value expectation (exactly same approach common in animal phys) Sugrue et al. (2004): primate LIP neurons: Daw, O’Doherty et al. (2006): vmPFC activity in humans % signal change probability of chosen action

note: can also fit parametric models to neural signals; compare neural & behavioral fits (Kable et al 2007; Tom et al 2007) note 2: must as always be suspicious about spurious correlations –still good to use controls (eg is regressor loading better in this condition than another)

Examples: loss aversion Tom et al (2007): compare loss aversion estimated from neural value signals to behavioral loss aversion from choices money utility

example positional uncertainty in navigation task (Yoshida et al 2006) model: subjects assume they are someplace until proven wrong; then try assuming somewhere else estimate where subject thinks they are at each step

correlate uncertainty in position estimate with BOLD signal

summary trial and error learning & choice –interaction between the two –rich theory even for simple tasks model fits to choice behavior –hierarchical model of population –quantify subjective factors same methods for fMRI, ephys –but keep your wits about you