Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamics of Reward Bias Effects in Perceptual Decision Making

Similar presentations


Presentation on theme: "Dynamics of Reward Bias Effects in Perceptual Decision Making"— Presentation transcript:

1 Dynamics of Reward Bias Effects in Perceptual Decision Making
Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland

2 Our Questions Can we trace the effect of reward bias on decision making over time? Can we determine what would be the optimal policy, and what constraints there are on this policy? Can we determine how well participants do at achieving optimality? Can we uncover the processing mechanisms that lead to the observed patterns of behavior?

3 Overview Experiment Results Optimality analysis
Abstract dynamical model Mechanistic dynamical model

4 Human Experiment Examining Reward Bias Effect at Different Time Points after Target Onset
Stimuli are rectangles shifted 1,3, or 5 pixels L or R of fixation Reward cue occurs 750 msec before stimulus. Small arrow head visible for 250 msec. Only biased reward conditions (2 vs 1 and 1 vs 2) are considered. Response signal occurs at these times after stimulus onset: Participant receives reward (one or two points) if response occurs within 250 msec of response signal and is correct. Participants were run for sessions to provide stable data. Data shown are from later sets of sessions in which the biasing effect of reward appeared to be fairly stable.

5 A participant with very little reward bias
Top panel shows probability of response giving larger reward as a function of actual response time for combinations of: Stimulus shift (1 3 5) pixels Reward-stimulus compatibility Lower panel shows data transformed to z scores, and corresponds to the theoretical construct: mean(x1(t)-x2(t))+bias(t) sd(x1(t)-x2(t)) where x1 represents the state of the accumulator associated with greater reward, x2 the same for lesser reward, and S is thought to choose larger reward if x1(t)-x2(t)+bias(t) > 0.

6 Participants Showing Reward Bias

7

8 Abstract optimality analysis

9 Assumptions At a given time, two distributions, means +mu, -mu, same STD sigma. Choice x >?< X_c For three difficulty levels, same STD sigma, means mu_i (i=1,2,3), same X_c.

10 Only one diff level Three diff levels Subject’s sensitivity, a definition in theory of signal detectability When response signal delay varies For each subject, fit with function

11 Subject Sensitivity

12

13 Real “bias” Optimal “bias”

14

15 Dynamical analysis Based on one dimensional leaky integrator model.
Initial condition: x = 0 Chose left if x > 0 when the response signal is detected; otherwise choose right. Accuracy approximates exponential approach to asymptote because of leakage. How is the reward implemented? A time-varying offset that optimizes reward? Offset in initial conditions? An additional term in the input to the decision variable? A fixed offset in the value of the decision variable?

16 1. Time-varying term that optimizes rewards (No free parameter for reward bias)
0.5 1 1.5 2 2.5 0.2 0.4 0.6 0.8 Time (s) P of choice toward larger reward RSC 1, diff 5 RSC 0, diff 5 RSC 1, diff 3 RSC 0, diff 3 RSC 1, diff 1 RSC 0, diff 1 Notes: Equivalent to a time-varying criterion = -b(t). There is a dip at Prediction and test: higher C level  earlier dip. For multiple C levels, no analytical expressions.

17 2. Offset in initial conditions
Notes: Effect of the bias decays away for lambda<0. Single C level , a dip at Prediction and test: higher C level  earlier dip

18 3. Reward as a term in the input
Reward signal comes -t seconds relative to stimulus. For t<0: input = b; noise sd = s For t>0, input = b+aC; noise continues as before. Notes: Effect of the bias persists. But bias is sub-optimal initially, and there is no dip. They forgot the 2 here. Thoeritically, the dip should happen at 1/lambda* log ( (ac-bk)/(ack^2-bk^2) ), where k=exp(lambda*tau). The t calculated is negative. 18

19 4. Reward as a constant offset in the decision variable
Note: Equivalent to setting criterion at –m0 Effect persists for lambda<0. Single C level , a dip at Prediction and test: higher C level  earlier dip

20 5. Reward as a term in the input, creating variability at stimulus onset
Reward signal comes -t seconds relative to stimulus. For t<0: input = b, noise sd = sb Eor t>0, input = b+aC; noise sd = sb+s. Notes: Effect of the bias persists. If sb = 0, no dip. Prediction and test: given small sb, longer reward period  later and shallower dip. They forgot the 2 here. Thoeritically, the dip should happen at 1/lambda* log ( (ac-bk)/(ack^2-bk^2) ), where k=exp(lambda*tau). The t calculated is negative. 20

21 Leaky Competing Integrator Model
Inputs for: reward stimulus response signal High threshold for

22


Download ppt "Dynamics of Reward Bias Effects in Perceptual Decision Making"

Similar presentations


Ads by Google