Download presentation

Presentation is loading. Please wait.

Published byDerek Robbin Modified about 1 year ago

1
Institute for Theoretical Physics and Mathematics Tehran January, 2006 Value based decision making: behavior and theory

2
Greg Corrado Leo Sugrue

3
SENSORY INPUT DECISION MECHANISMS ADAPTIVE BEHAVIOR low level sensory analyzers motor output structures

4

5
SENSORY INPUT DECISION MECHANISMS ADAPTIVE BEHAVIOR low level sensory analyzers motor output structures REWARD HISTORY representation of stimulus/ action value

6
How do we measure value? Herrnstein RJ, 1961

7
The Matching Law Choice Fraction

8
Behavior: What computation does the monkey use to ‘match’? Theory: Can we build a model that replicates the monkeys’ behavior on the matching task? How can we validate the performance of the? model? Why is a model useful? Physiology: What are the neural circuits and signal transformations within the brain that implement the computation?

9
An eye movement matching task Baiting Fraction 1:11:1 6:16:1 1:61:6 6:16:1 1:21:2 2:12:1 1:21:2

10
Dynamic Matching Behavior

11
Rewards Dynamic Matching Behavior

12
Responses Rewards Dynamic Matching Behavior

13
Relation Between Reward and Choice is Local Responses Rewards

14
How do they do this? What local mechanism underlies the monkey’s choices in this game? To estimate this mechanism we need a modeling framework.

15
Linear-Nonlinear-Poisson (LNP) Models of choice behavior Strategy estimation is straightforward

16
How do animals weigh past rewards in determining current choice? Estimating the form of the linear stage

17

18

19

20
How is differential value mapped onto the animal’s instantaneous probability of choice? Estimating the form of the nonlinear stage Differential Value

21
Differential Value (rewards) Monkey F Monkey G Probability of Choice (red)

22
Our LNP Model of Choice Behavior Model Validation Can the model predict the monkey’s next choice? Can the model generate behavior on its own?

23
Can the model predict the monkey’s next choice?

24
Predicting the next choice: single experiment

25
Predicting the next choice: all experiments

26
Can the model generate behavior on its own?

27
Model generated behavior: single experiment

28
Distribution of stay durations summarizes behavior across all experiments Stay Duration (trials)

29
Model generated behavior: all experiments Stay Duration (trials)

30
Model generated behavior: all experiments Stay Duration (trials)

31
1.Explore second order behavioral questions 2.Explore neural correlates of valuation Ok, now that you have a reasonable model what can you do with it?

32
1.Explore second order behavioral questions 2.Explore neural correlates of valuation Ok, now that you have a reasonable model what can you do with it?

33
choice history: Surely ‘not getting a reward’ also has some influence on the monkey’s behavior? reward history: Choice of Model Input

34
choice history: reward history: the value of an unrewarded choice hybrid history: Choice of Model Input

35
Systematically vary the value of Estimate new L and N stages for the model Test each new model’s ability to a) predict choice and b) generate behavior hybrid history: Can we build a better model by taking unrewarded choices into account?

36
Value of Unrewarded Choices ( ) Predictive Performance Generative Performance Unrewarded choices: The value of nothin’

37
Value of Unrewarded Choices ( ) Predictive Performance Generative Performance Stay Duration Histogram Overlap (%) Unrewarded choices: The value of nothin’

38
Contrary to our intuition inclusion of information about unrewarded choices does not improve model performance Choice of Model Input

39
Optimality of Parameters

40
Weighting of past rewards Is there an ‘optimal’ weighting function to maximize the rewards a player can harvest in this game?

41

42

43

44

45

46

47

48

49

50

51

52

53

54
The tuning of the 2 (long) component of the L- stage affects foraging efficiency. Monkeys have found this optimum. Weighting of past rewards The 1 (short) component of the L-stage does not affect foraging efficiency. Why do monkeys overweight recent rewards? The tuning of the , the nonlinear function relating value to p(choice) affects foraging efficiency. The monkeys have found this optimum also.

55

56

57

58
The differential model is a better predictor of monkey choice

59
Monkeys match; best LNP model Model predicts and generates choices Monkeys find optimal 2 and ; 1 not critical Unrewarded choices have no effect Differential value predicts choices better than fractional value

60
?

61
Best LNP model: Candidate decision variable, differential value: g(v1 - v2) = pc

62

63

64
Aside: what would Bayes do? 1) maintain beliefs over baiting probabilities 2) be greedy or use dynamic programming

65
Firing rates in LIP are related to target value on a trial-by-trial basis LIP gm020b into RF out of RF Target Value

66
The differential model also accounts for more variance in LIP firing rates

67
How we control/measure value An experimental task based on that principle A simple model of value based choice How we validate that model How we use the model to explore behavior How we use the model to explore value related signals in the brain What I’ve told you: Our Linear-Nonlinear-Poisson model Hybrid models, optimality of reward weights Neural firing in area LIP correlates with ‘differential value’ on a trial-by-trial basis A dynamic foraging task The matching law Predictive and generative validation

68

69
Foraging Efficiency Varies as a Function of 2

70
Foraging Efficiency Does Not Vary as a Function of 1

71
What do animals do? Matching is a probabilistic policy: Matching is almost optimal within the set of probabilistic policies. Animals match.

72

73
+ the change over delay

74
Greg Corrado

75

76

77

78

79

80

81

82

83

84
How do we implement the change over delay? only one ‘live’ target at a time

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google