Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 11. System level organization and coupled networks Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer.

Similar presentations


Presentation on theme: "1 11. System level organization and coupled networks Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer."— Presentation transcript:

1 1 11. System level organization and coupled networks Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer Science and Engineering Graduate Programs in Cognitive Science, Brain Science and Bioinformatics Brain-Mind-Behavior Concentration Program Seoul National University E-mail: btzhang@bi.snu.ac.kr This material is available online at http://bi.snu.ac.kr/ Fundamentals of Computational Neuroscience, T. P. Trappenberg, 2002.

2 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr Outline 2 11.1 11.2 11.3 11.4 11.5 11.6 System level anatomy of the brain Modular mapping networks Coupled attractor networks Working memory Attentive vision An interconnecting workspace hypothesis

3 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.1 System level anatomy of the brain The brain is more than just a big neural network with completely interconnected neurons Combine the basic networks  Associative and competitive networks Global architectures reflecting large-scale organizations of the brain Modular networks resulting from combining the basic networks  Display some structure within their architecture as opposed to completely interconnected networks 3

4 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.1.1 Large-scale anatomical and functional organization in the brain 4 Fig. 11.1 Example of a map of connectivities between cortical areas involved in visual processing.

5 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.1.2 Advantages of modular organizations Modular specialization is used in the brain The functional significance of modular specialization in visual processing  The cortex uses inhibition to sharpen various visual attributes  Color, edges or orientations  Local inhibition The separate attentional amplifications of separate features Learning speed, generalization abilities, representation capabilities and task realizations Modular mapping networks Modular attractor networks 5

6 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.2 Modular mapping networks 11.2.1 Mixture of experts 6 Fig. 11.2 An example of a type of modular mapping network called mixture of experts. Each expert, the gating network, and the integration network are usually mapping networks. The input layer of the integration network is composed of sigma-pi nodes, as the output of the gating network weights (modulates) the output of the expert networks to form the inputs of the integration network.

7 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.2.2 Divide-and-conquer 7 Fig. 11.3 (A) Illustration of the absolute function f(x) = |x| that is difficult to approximate with a single mapping network. (B) A modular mapping network in the form of a mixture of experts that can represent the absolute function accurately.

8 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.2.3 Training modular structures Training such networks  To solve specific tasks in flexible manner Training the experts alone has two component:  To assign the expert to particular task  To train each expert on the designated task Training the gating network  Credit-assignment problem Biological systems  A task assignment phase and an expert training phases are not separated 8

9 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.2.4 The ‘what-and-where’ task The ventral visual pathway: ‘what’ The dorsal visual pathway: ‘where’ Performing object recognition and determining the location of objects in a modular network Retina of 5 x 5 cells Object of 3 x 3 patterns 26 input channels  25 for retina inputs  1 for task specification 18 output nodes  9 for objects patterns  9 for location 36 hidden nodes Back-propagation learning 9 Fig. 11.4 Example of the ‘what- and-where’ tasks. (A) 5 x 5 model retina with 3 x 3 image of an object as an example.

10 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.2.5 Temporal and spatial cross-talk Conflicting training information Temporal cross-talk  The network will quickly adapt to reasonable performances of the ‘what’ task if we train the network first entirely on this task  The representations of the hidden layers will change in a subsequent learning period on the ‘where’ task, which is likely to conflict with the representation necessary for the ‘what’ task  Training sets with conflicting training pattern Spatial cross-talk  One training example due to the distributed nature of the representations The division of the tasks into separate networks  Abolish both problematic cross-talk conflicts 10

11 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.2.6 Task decomposition Modular networks can learn task decomposition Solution of Jacobs, Jordan and Barto  Using a gating network that increased the strength to that expert network that significantly improved the output the system  The back-propagated error signal is modulated by the gating weights  The module that contributed most to the answer will adapt most to the new example  Specialized expert Solution of Jacobs and Jordan  A physical location of the nodes in a single mapping network  Used a distance-dependent term in the objective function  Leads to a weight decay favoring short connections  The objective (or error) function 11 Fig. 11.4 (B) Connection weights between hidden and output nodes in a single mapping network. Positive weights are shown by solid squares, while open squares symbolize negative values. (C) Connection weight between hidden and output nodes in a single mapping network when trained with a bias toward short connections. (11.1)

12 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.2.7 Product of experts Output can be interpreted as the probability that an input vector has a certain feature If the summed output of the expert networks are normalized  Modular networks can easily be generalized to allow similar interpretations View the mixture of experts as a collection of experts whose weighted opinion is averaged to determine the probability of the feature value The probability of the independent events The advantages of a product of experts relative to the weighted mean calculated by the mixture of experts  A large probability assigned by one expert can be largely suppressed by low probabilities assigned by other experts  A large probability is only indicated if there is some agreement between the experts  Allows the individual experts to assign unreasonably large probabilities to some event as long as other experts represent such events more accurately with low probabilities 12

13 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.3 Coupled attractor networks 13 Fig. 11.5 Coupled (or subdivided) recurrent neural networks. The nodes in this example are divided into two groups (the nodes of each group are indicated with different shadings). There are connections within the nodes of each group and between nodes of different groups.

14 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.3.1 Imprinted and composite patterns One single point attractor network versus two single point attractor networks Imprint into this system all objects  Two independent feature vectors representing  m possible feature values for each feature of an object  Build m 2 possible objects A network with 1000 nodes can store around 138 patterns  Hebbian rule  The storage capacity α c ≈ 0.138 (see Chapter 8)  A network with two such independent subnetworks could store , N is number of nodes  The number of patterns that can be stored in a single networks is only 14 (11.2) (11.3)

15 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.3.2 Signal-to-noise analysis The behavior of coupled attractor networks using a signal-to- noise analysis An overall network of N nodes that can be divided into modules, each having the same number of nodes N′ The weights are trained with the Hebbian rule Modulate the weight values between the modules with a factor g Define a new weight matrix with components A modulation matrix g 15 (11.4) (11.5) (11.6) (11.7)

16 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.3.3 Imprinted pattern To evaluate the stability of the imprinted pattern Separate the signal term from the noise terms To simplify the formulas The capacity bound 16 Fig. 11.6 Coupled attractor neural networks: result from signal-to-noise analysis. (A) Dependence of the load capacity of the imprinted pattern on relative intermodule coupling strength g for different numbers of modules m. (11.8) (11.9) (11.10) (11.11)

17 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.3.4 Composite pattern (1) All kinds of combination of patterns in different modules All the subpatterns in the modules are chosen to be the first training pattern except for the module to which the node under consideration belongs 17 Fig. 11.6 (B) Bounds on relative intermodule coupling strength g. For g value greater than the upper curve the imprinted patterns are stable. For g less then the lower curve the composite patterns are stable. In the narrow band in between we can adjust the system to have several composite and some imprinted patters stable. This band gets narrower as the number of modules m increase and vanishes for networks with many modules. (11.12) (11.13) (11.14) (11.15)

18 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.3.4 Composite pattern (2) The reverse case 18 Fig. 11.6 (B) Bounds on relative intermodule coupling strength g. For g value greater than the upper curve the imprinted patterns are stable. For g less then the lower curve the composite patterns are stable. In the narrow band in between we can adjust the system to have several composite and some imprinted patters stable. This band gets narrower as the number of modules m increase and vanishes for networks with many modules. (11.16) (11.17) (11.18) (11.19)

19 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.4 Working memory A specific hypothesis on the implementation of working memory in the brain Working memory is a construct that is hard to define precisely  Workspace that provides the necessary information to solve complex task  Language comprehension  Mental arithmetic  Reasoning for problem-solving and decision-making  The specific form of short-term memory (STM) 19

20 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.4.1 Distributed model of working memory A conceptual model of working memory  Based on modular structure How the interaction of these modules enable the brain to utilize the kind of information 20 Fig. 11.7 A modular system of short-, intermediate-, and long-term memory, which are associated with functionalities of the prefrontal cortex (PFC), the hippocampus and related areas (HCMP), and the perceptual and motor cortex (PMC), respectively.

21 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.4.2 Limited capacity of working memory ‘31, 27, 4, 18’ ‘62, 97, 12, 73, 27, 54, 8’ The limited capacity of working memory  ‘Magical number 7±2’ The very limited capacity of working memory is puzzling Correlated classical measurement of IQ factors A lager storage capacity should make us fitter to survive The search for the reasons behind the limited capacity of working memory is prominent in cognitive neuroscience 21

22 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.4.3 Spurious synchronization The reasons behind the limited capacity of working memory 22 Fig. 11.8 (A) The percentage of correct responses (recall ability) in human subjects in a sequential comparison task of two images with different numbers of objects N obj (solid squares). The dotted lines illustrate examples of the functional form as suggested by the synchronization hypothesis. The dashed line corresponds to the results with P SS (2) = 0.04, where P SS is the probability of spurious synchronization. (B) Illustration of the spurious synchronization hypothesis. The features of the object are represented by different spike trains, so that the number of synchronous spikes within a certain resolution increases with increasing an number of objects.

23 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.4.4 Quantification of the spurious synchronization hypothesis The spurious synchronization hypothesis  Each additional feature of one object would in any case be synchronized with the other features of the objects so that only the number of objects with different spike trains would matter  The number of pairs  The probability of spurious synchronization between at least two spike trains in a set of N obj spike trains (pattern)  A functional expectation of the percentage of correct recall 23 (11.20) (11.21) (11.22)

24 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.4.5 Other hypotheses on the capacity limit Lisma and Idiart  The limit might be solely due to the limiting ability of reverberating neural activity for short-term memory  The representation of different objects is kept in different high- frequency subcycles of low-frequency oscillations found in the brain Nelson Cowan  Based on the limits of an Attentional system  Thought to be a necessary ingredient in working memory models 24

25 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.5 Attentive vision 11.5.1 Object recognition versus visual search 25 Fig. 11.9 Illustration of a visual search and an object recognition task. Each task demands a different strategy in exploring the visual scene.

26 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.5.2 The overall model 26 Fig. 11.10 Outline of a model of the visual processing in primates to simulate visual search and object recognition. The main parts of the model are inspired by structural features of the cortical areas thought to be central in these processes. Theses include early visual areas (labeled ‘V1-V4’) that represent the content of the visual field topographically with basic features, the inferior-temporal cortex (labeled ‘IT’) that is known to be central for object recognition, and the posterior parietal cortex (labeled ‘PP’) that is strongly associated with spatial attention.

27 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.5.3 Object representation in early visual areas The first part of the model labeled ‘V1-V4’ represents early visual areas  The striate cortex (V1) and adjacent visual areas The primary visual area V1 is actually the main cortical area  Receives visual input from the LGN of the thalamus  The major target of the optic nerves from the eyes Neuronal response to visual input from the eyes  Gabor functions The principal role  The decomposition of the visual field into feature  Orientation, color, motion, etc. The modeling point of view  The feature representation in this part of the model is topographic  Features are represented in modules  Correspond to the location of the object in the visual field 27

28 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.5.4 Translation-invariant object recognition with ANN The representation of the visual field in ‘V1-V4’ in this model feed into the part labeled ‘IT’ The inferior-temporal cortex  Involved in object recognition The connections between the ‘V1-V4’ and the ‘IT’  Trained with Hebbian learning Translation-invariant object recognition  The point attractor network to ‘recognize’ trained objects in test trials at all locations in the visual field Cortical magnification 28

29 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.5.5 Size of the receptive field The size of the receptive field of inferior-temporal neurons depends on the content of the visual field and the specifics of the task 29 Fig. 11.11 (A) Example of the average firing rate from recordings of a neuron in the inferior-temporal cortex of a monkey in response to an effective stimulus that is located at various degrees away from the direction of gaze. (B) Simulation results of a model with the essential components of the model shown in Fig. 11. 10. The correlation is thereby a measure of overlap between the activity of ‘IT’ nodes with the representation of the target object that was used during training.

30 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.5.6 Attentional bias in visual search and object recognition Visual search  Simulate with supplying an object bias input to the attractor network in ‘IT’  Top-down information  The additional input of an object bias to ‘IT’ can speed-up the recognition process in ‘IT’ The object bias also supports the recognition ability of the input from ‘V1-V4’ that corresponds to the target object Parallel conclusions  An object recognition task in which top-down input to a specific location in ‘PP’ is given  Enhance the neural activity in ‘V1-V4’ for the features of the object that is located at the corresponding location 30

31 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.5.7 Parallel versus serial search 31 Fig. 11.12 Numerical experiments in which the model simulated a visual search task of a target object (the letter ‘E’) in a visual scene with visual distractors. (A) In one experiment the distractors consist of the letters ‘X’ that are visually very different from the target letter. The activity if a ‘PP’ node that corresponds to the target location increases in these experiments independently of the number of distractors, implying parallel search. (B) The second experiment was doe with distractors (letter ‘F’) that were visually similar to the target letter. The reaction times, as measured from ‘PP’ nodes, depends linearly on the number of objects, a feature that is also characteristic of serial search. Both modes are, however, present in the same ‘parallel architecture’.

32 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.6 An interconnecting workspace hypothesis 11.6.1 The global workspace 32 Fig. 11.13 Illustration of the workspace hypothesis. Two computational spaces can be distinguished, the subnetworks with localized and specific computational specialization, and an interconnecting network that is the platform of the global workspace.

33 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr 11.6.2 Demonstration of the global workspace in the Stroop task 33 Fig. 11.14 (A) In a Stroop task a word for a color, written in a color that can be different from the meaning of the word, is shown to a subject who is ask to perform either a word- naming or color-naming task. (B) Global workspace model that is able to reproduce several experimental findings in the Stroop task.

34 (C) 2009 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr Conclusion Fundamental example of modular networks  Complex information-processing systems Expert and gating networks Demonstrate the number of modules in coupled attractor networks Working memory with modular networks Attentional vision  Object recognition  Visual search Workspace hypothesis 34


Download ppt "1 11. System level organization and coupled networks Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer."

Similar presentations


Ads by Google