Revisiting James March’s Exploration- Exploitation Trade-off With a Neurobiological Basis Chiara Chelini University of Turin ESA World Meeting, Rome, 28°

Revisiting James March’s Exploration- Exploitation Trade-off With a Neurobiological Basis Chiara Chelini University of Turin ESA World Meeting, Rome, 28° June-1° July 2007

2 Content Reinterpretation of James March’s exploration-exploitation trade off [March, Organization Science, 1991] from the point of view of recent literature [Daw et al., 2006, Nature] on neuroscience Learning Mechanisms in organizational and individual decision-making

3 Content Theoretical and Methodological Background Successes and Cognitive Traps in Learning Empirical Evidence from a Neuroeconomics Experiment Towards a Conclusion: Would March Agree with this new interpretation?

4 Theoretical and Methodological Background Neuroeconomics represents a useful field of research to better understand the science of decision making. Biological basis on decision making. New evidence on economic games results: e.g. recent replications of Ultimatum Game have shown that a traditional economic and full rational explanation may not hold anymore [Camerer et al., 1999].

5 Successes and Cognitive Traps in Learning Herbert Simon and James March, “Organizations” [1958]: routine as a procedure for decision making Macro level: “repetitive organizational procedures” Micro level: “individual activities automatically triggered on the basis of stable mental models” Intuition and tacit component: Much of the behaviour we observe in organizations is “intuitive” in the sense that it occurs immediately upon recognition of a situation, hence much of the operations we observe in organizational actions come not from explicit analysis but from rules: recognition is in fact the capacity to distinguish “familiar significant cues, and to retrieve stored knowledge about how to use them”.[March and Simon, 1958]

6 Successes and Cognitive Traps in Learning Consistent Optimistic Bias: adjusting downward aspirations more slowly than they adjust upward. For instance when a new idea or a new technology is already introduced and they fail, a learning process will start in order to find a better solution. If we register a second and third failure and so on, a potentially “endless cycle of failure and unrewarding change“[Levinthal and March, 1993] may be triggered. Success or Competence trap: a “potentially self destructive product of learning” because “exploitation drives out exploration“ because when an organisation starts accumulating greater and greater competence in a particular field and it finds that it is rewarding, this positive feedback makes the organization engage more and more in the same activity but also stop innovating.” [Levinthal and March, 1993].

7 Successes and Cognitive Traps in Learning Then “in such a world, we must give an account not only of substantive rationality- the extent to which appropriate courses of action are chosen- but also procedural rationality- the effectiveness, in light of human cognitive powers and limitations, of the procedures used to chose actions.“ [Simon, 1978].

8 James March’s Exploration- Exploitation Trade-off Model of mutual learning between organization and individuals: External reality with m dimensions, each of which can take a value of 1 or -1, both with the same (indipendent) probability 0.5. At each time of period individuals n in an organization hold specific beliefs about external reality. The belief may match or not some aspects of the reality and take different values in accordance with this. Each organization has a “code“ that can be described as a set of “procedures, norms, rules and forms“ in which knowledge is stored and that can perform two kinds of learning: from and by the code.

9 James March’s Exploration- Exploitation Trade-off – Learning from the code: p1, the probability by which individual belief change to that of the code: a measure of the effectiveness of socialization. – Learning by the code: p2, the code may adapt to the beliefs that correspond to the reality on more dimensions that the code does. It is the effectiviness of learning by the code

10 James March's Simulation March run a simulation characterized by. 30 dimensions of reality (m), 50 individuals (n). 80 iterations. He explains that the equilibrium level of knowledge in organizations is given by the interaction of the two learning parameters p1 (learning from the code) and p2 (learning by the code).

11 James March's Result In particular “when socialization is slow, more rapid learning by the code leads to greater knowledge at equilibrium; but when socialization is rapid, greater equilibrium knowledge is achieved through slower learning by the code. By far the highest equilibrium knowledge occurs when the code learns rapidly from individuals whose socialization to the code is slow” [March,1991, page 6].

12 James March's Result A learning process could not be always exploitative because it would be myopic: “tendencies to increase exploitation and reduce exploration make adaptive processes potentially self-destructive“. [March, 1991] An adequate balance between exploitation and exploration is needed. The dilemma concerns then the possibility to calculate a correct measure of the two and to establish the instant in which a routine must be substituted because of its lack of performance.

13 Four Armed-Bandit Experiment [Daw et al., 2006] In their research Daw et al. [2006] investigate which specific brain's areas are activated during explorative and exploitative tasks in a “four-armed bandit problem“ In their experiment Dew et al. involved 14 healthy subjects who perform repeated choices between four different coloured slot machines that appear on a screen The pay-off that the subjects can get are between 1 and 100, drawn from a Gaussian distribution that the subject does not know. These features of the experimental design allow studying explorative and exploitative decisions under uniform conditions, in the context of a single task“. [Daw et al.,2006]

14 Experimental Findings In following interviews the subjects explain their strategies: most of them (11 of 14) reports that “occasionally try the different slots to work out which currently had the highest pay-offs (exploring) while the other times they choose the slot they thought had the highest pay-off (exploiting). The aim of the research is to give a quantitative basis to reinforcement learning strategies for exploration, that differs in “how exploratory action are directed“ [Daw et al., 2006]

15 Reinforcement Learning Rules ε greedy rule: select the action with the maximum value function most of the time, but choose randomly among the remaining options with a small probability (ε)“ [Lee]. The maximization option exploits, while the random option is the exploratory move. When the agent chooses randomly, she chooses equally among all options, then “uniformly and independently of the action-value estimates” softmax rule: the agent evaluates a difference in the value function of the options and chooses always the one she thinks is the best. While in ε- greedy the random selection is equally distributed among all options, the softmax rule uses a Gibbs, or Bolzman, distribution. Therefore with ε-greedy rule “it is likely to choose the worst- appearing action as it is to choose the next-to-best action. In task were the worst actions are very bad, this may be unsatisfactory.

16 Empirical Evidence from the Experiment Daw et al. [2006] classify each trial as explorative or exploitative and show the activity associated with each choice. Striatum and ventromedial prefrontal cortex: exploitative decision making. Frontopolar cortex, “a region considered important for the control of cognitive functions“ [Lee, 2006] and intraparetial sulcus: explorative decisions. Moreover regions of medial orbifrontal cortex (mOFC) are correlated significantly with the numbers of points the subject receives and then they produce an immediate reinforcement. This issue is also consistent with a research that indicates in the OFC the main brain area that encodes economic value [Padoa-Schioppa, 2006]

17 Towards a Conclusion: Would James March agree with this interpretation? In my opinion, he does: experimental economics as a more realistic approach. In the exploration-exploitation dilemma the movement to explore can be seen as a random shock that we can call “ε“ and it is not possible to give a correct prediction of the specific and contingent moment in which it could happen. The model of bounded and procedural rationality finds here a field of application in a particular kind of rationality that we would like to call “residual rationality“: the subject makes a cognitive control over the routine elaborated yet, learns from experience, makes inferences but is not able to forecast exactly when the random shock that make the routine change will take place.

Revisiting James March’s Exploration- Exploitation Trade-off With a Neurobiological Basis Chiara Chelini University of Turin ESA World Meeting, Rome, 28°

Similar presentations

Presentation on theme: "Revisiting James March’s Exploration- Exploitation Trade-off With a Neurobiological Basis Chiara Chelini University of Turin ESA World Meeting, Rome, 28°"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Revisiting James March’s Exploration- Exploitation Trade-off With a Neurobiological Basis Chiara Chelini University of Turin ESA World Meeting, Rome, 28°

Similar presentations

Presentation on theme: "Revisiting James March’s Exploration- Exploitation Trade-off With a Neurobiological Basis Chiara Chelini University of Turin ESA World Meeting, Rome, 28°"— Presentation transcript:

Similar presentations

About project

Feedback