Presentation is loading. Please wait.

Presentation is loading. Please wait.

Secrets of Neural Network Models Ken Norman Princeton University July 24, 2003 Note: These slides have been provided online for the convenience of students.

Similar presentations


Presentation on theme: "Secrets of Neural Network Models Ken Norman Princeton University July 24, 2003 Note: These slides have been provided online for the convenience of students."— Presentation transcript:

1 Secrets of Neural Network Models Ken Norman Princeton University July 24, 2003 Note: These slides have been provided online for the convenience of students attending the 2003 Merck summer school, and for individuals who have explicitly been given permission by Ken Norman. Please do not distribute these slides to third parties without permission from Ken (which is easy to get… just email Ken at knorman@princeton.edu).

2 The Plan, and Acknowledgements The Plan: I will teach you all of the the secrets of neural network models in 2.5 hours Lecture for the first half Hands-on workshop for the second half Acknowledgements: Randy O’Reilly my lab: Greg Detre, Ehren Newman, Adler Perotte, and Sean Polyn

3 The Big Question How does the gray glop in your head give rise to cognition? We know a lot about the brain, and we also know a lot about cognition The real challenge is to bridge between these two levels

4 Complexity and Levels of Analysis The brain is very complex: billions of neurons, trillions of synapses, all changing every nanosecond Each neuron is a very complex entity unto itself We need to abstract away from this complexity! Is there some simpler, higher level for describing what the brain does during cognition?

5 We want to draw on neurobiology for ideas about how the brain performs a particular kind of task Our models should be consistent with what we know about how the brain performs the task But at the same time, we want to include only aspects of neurobiology that are essential for explaining task performance

6 Learning and Development Neural network models provide an explicit, mechanistic account of how the brain changes as a function of experience Goals of learning: To acquire an internal representation (a model) of the world that allows you to predict what will happen next, and to make inferences about “unseen” aspects of the environment The system must be robust to noise/degradation/damage Focus of workshop: Use neural networks to explore how the brain meets these goals

7 Outline of Lecture What is a neural network? Principles of learning in neural networks: Hebbian learning: Simple learning rules that are very good at extracting the statistical structure of the environment (i.e., what things are there in the world, and how are they related to one another) Shortcomings of Hebbian learning: It’s good at acquiring coarse category structure (prototypes) but it’s less good at learning about atypical stimuli and arbitrary associations Error-driven learning: Very powerful rules that allow networks to learn from their mistakes

8 Outline, Continued The problem of interference in neocortical networks, and how the hippocampus can help alleviate this problem Brief discussion of PFC and how networks can support active maintenance in the face of distracting information Background information for the “hands-on” portion of the workshop

9 Overall Philosophy The goal is to give you a good set of intuitions for how neural networks function I will simplify and gloss over lots of things. Please ask questions if you don’t understand what I’m saying...

10 What is a neural network? Neurons measure how much input they receive from other neurons; they “fire” (send a signal) if input exceeds a threshold value Input is a function of firing rate and connection strength Learning in neural networks involves adjusting connection strength

11 What is a neural network? Key simplifications: We reduce all of the complexity of neuronal firing to a single number, the activity of the neuron, that reflects how often the neuron is spiking We reduce all of the complexity of synaptic connections between neurons to a single number, the synaptic weight, that reflects how strong the connection is

12 Neurons are Detectors Each neuron is detecting some set of conditions (e.g., smoke detector). Representation is what is detected.

13 Understanding Neural Components in Terms of the Detector Model

14 Detector Model Neurons feed on each other’s outputs; layers of ever more complicated detectors Things can get very complex in terms of content, but each neuron is still carrying out the basic detector function

15 Two-layer Attractor Networks Input/Output Layer Hidden Layer (Internal Representation) Model of processing in neocortex Circles = units (neurons); lines = connections (synapses) Unit brightness = activity; line thickness = synaptic weight Connections are symmetric

16 Two-layer Attractor Networks Input/Output Layer Hidden Layer (Internal Representation) Units within a layer compete to become active. Competition is enforced by inhibitory interneurons that sample the amount of activity in the layer and send back a proportional amount of inhibition Inhibitory interneurons prevent epilepsy in the network Inhibitory interneurons are not pictured in subsequent diagrams I

17 Two-layer Attractor Networks Input/Output Layer Hidden Layer (Internal Representation) These networks are capable of sustaining a stable pattern of activity on their own. “Attractor” = a fancy word for “stable pattern of activity” Real networks are much larger than this, also > 1 unit is active in the hidden layer... I

18 Properties of Two-Layer Attractor Networks I will show that these networks are capable of meeting the “learning goals” outlined Given partial information (e.g., seeing something that has wings and features), the networks can make a “guess” about other properties of that thing (e.g., it probably flies) Networks show graceful degradation

19 “Pattern Completion” in two layer networks wingsbeakfeathers flies

20 wingsbeakfeathers flies “Pattern Completion” in two layer networks

21 wingsbeakfeathers flies “Pattern Completion” in two layer networks

22 wingsbeakfeathers flies “Pattern Completion” in two layer networks

23 wingsbeakfeathers flies Networks are Robust to Damage, Noise

24 wingsfeathers flies Networks are Robust to Damage, Noise

25 wingsfeathers flies Networks are Robust to Damage, Noise

26 wingsfeathers flies Networks are Robust to Damage, Noise

27 wingsfeathers flies Networks are Robust to Damage, Noise

28 Learning: Overview Learning = changing connection weights Learning rules: How to adjust weights based on local information (presynaptic and postsynaptic activity) to produce appropriate network behavior Hebbian learning: building a statistical model of the world, without an explicit teacher... Error-driven learning: rules that detect undesirable states and change weights to eliminate these undesirable states...

29 Building a Statistical Model of the World The world is inhabited by things with relatively stable sets of features We want to wire detectors in our brains to detect these things. How can we do this? Answer: Leverage correlation The features of a particular thing tend to appear together, and to disappear together; a thing is nothing more than a correlated cluster of features Learning mechanisms that are sensitive to correlation will end up representing useful things

30 Hebbian Learning How does the brain learn about correlations? Donald Hebb proposed the following mechanism: When the pre-synaptic neuron and post-synaptic neuron are active at the same time, strengthen the connection between them “neurons that fire together, wire together”

31 Hebbian Learning

32

33

34 Proposed by Donald Hebb When the pre-synaptic (sending) neuron and post-synaptic (receiving) neuron are active at the same time, strengthen the connection between them “neurons that fire together, wire together” When two neurons are connected, and one is active but the other is not, reduce the connections between them “neurons that fire apart, unwire”

35 Hebbian Learning

36

37 Biology of Hebbian Learning: NMDA-Mediated Long-Term Potentiation

38 Biology of Hebbian Learning: Long-Term Depression When the postsynaptic neuron is depolarized, but presynaptic activity is relatively weak, you get weakening of the synapse

39 What Does Hebbian Learning Do? Hebbian learning tunes units to represent correlated sets of input features. Here is why: Say that a unit has 1,000 inputs In this case, turning on and off a single input feature won’t have a big effect on the unit’s activity In contrast, turning on and off a large cluster of 900 input features will have a big effect on the unit’s activity

40 Hebbian Learning

41

42 Because small clusters of inputs do not reliably activate the receiving unit, the receiving unit does not learn much about these inputs

43 Hebbian Learning

44

45

46 Big clusters of inputs reliably activate the receiving unit, so the network learns more about big (vs. small) clusters (the “gang effect”).

47 Hebbian Learning Big clusters of inputs reliably activate the receiving unit, so the network learns more about big (vs. small) clusters (the “gang effect”).

48 What Does Hebbian Learning Do? Hebbian learning finds the thing in the world that most reliably activates the unit, and tunes the unit to like that thing even more!

49 Hebbian Learning scalyslithers wingsbeakfeathers flies

50 Hebbian Learning scalyslithers wingsbeakfeathers flies

51 Hebbian Learning scalyslithers wingsbeakfeathers flies

52 Hebbian Learning scalyslithers wingsbeakfeathers flies

53 Hebbian Learning scalyslithers wingsbeakfeathers flies

54 Hebbian Learning scalyslithers wingsbeakfeathers flies

55 Hebbian Learning scalyslithers wingsbeakfeathers flies

56 Hebbian Learning scalyslithers wingsbeakfeathers flies

57 Hebbian Learning scalyslithers wingsbeakfeathers flies

58 What Does Hebbian Learning Do? Hebbian learning finds the thing in the world that most reliably activates the unit, and tunes the unit to like that thing even more! The outcome of Hebbian learning is a function of how well different inputs activate the unit, and how frequently they are presented

59 Self-Organizing Learning One detector can only represent one thing (i.e., pattern of correlated features) Goal: We want to present input patterns to the network and have different units in the network “specialize” for different things, such that each thing is represented by at least one unit Random weights (different initial receptive fields) and competition are important for achieving this goal What happens without competition...

60 No Competition lives under water scalyslithers wingsbeakfeathers flies

61 No Competition lives under water scalyslithers wingsbeakfeathers flies

62 No Competition lives under water scalyslithers wingsbeakfeathers flies

63 No Competition lives under water scalyslithers wingsbeakfeathers flies

64 No Competition lives under water scalyslithers wingsbeakfeathers flies

65 No Competition lives under water Without competition, all units end up representing the same “gang” of features; other, smaller correlations get ignored wingsbeakfeathersfliesscalyslithers

66 Competition is important lives under water scalyslithers wingsbeakfeathers flies

67 Competition is important lives under water scalyslithers wingsbeakfeathers flies

68 Competition is important lives under water inhibition scalyslithers wingsbeakfeathers flies

69 Competition is important lives under water inhibition scalyslithers wingsbeakfeathers flies

70 Competition is important lives under water scalyslithers wingsbeakfeathers flies

71 Competition is important lives under water scalyslithers wingsbeakfeathers flies

72 Competition is important lives under water scalyslithers wingsbeakfeathers flies

73 Competition is important stripedorangesharp teeth furryyellowchirps lives under water When units have different initial “receptive fields” and they compete to represent input patterns, units end up representing different things

74 Hebbian Learning: Summary Hebbian learning finds the thing in the world that most reliably activates the unit, and tunes the unit to like that thing even more When: There are multiple hidden units competing to represent input patterns Each hidden unit starts out with a distinct receptive field Then: Hebbian learning will tune these units so that each thing in the world (i.e., each cluster of correlated features) is represented by at least one unit

75 Problems with Penguins slitherslives in Antarctica waddles wingsbeakfeathers flies

76 Problems with Penguins slitherslives in Antarctica wingsbeakfeathers flies waddles

77 Problems with Penguins slitherslives in Antarctica wingsbeakfeathers flies inhibition waddles

78 Problems with Penguins slitherslives in Antarctica wingsbeakfeathers flies waddles

79 Problems with Penguins slitherslives in Antarctica wingsbeakfeathers flies waddles

80 Problems with Penguins slitherslives in Antarctica wingsbeakfeathers flies waddles

81 Problems with Penguins slitherslives in Antarctica wingsbeakfeathers flies inhibition waddles

82 Problems with Penguins slitherslives in Antarctica wingsbeakfeathers flies waddles

83 Problems with Penguins slitherslives in Antarctica wingsbeakfeathers flies waddles

84 Problems with Penguins slitherslives in Antarctica wingsbeakfeathers flies waddles

85 Problems with Hebb, and Possible Solutions Self-organizing Hebbian learning is capable of discovering the “high-level” (coarse) categorical structure of the inputs However, it sometimes collapses across more subtle (but important) distinctions, and the learning rule does not have any provisions for fixing these errors once they happen

86 Problems with Hebb, and Possible Solutions In the penguin problem, if we want the network to remember that typical birds fly, but penguins don’t, then penguins and typical birds need to have distinct (non-identical) hidden representations Hebbian learning assigns the same hidden unit to penguins and typical birds We need to supplement Hebbian learning with another learning rule that is sensitive to when the network makes an error (e.g., saying that penguins fly) and corrects the error by pulling apart the hidden representations of penguins vs. typical birds.

87 What is an error, exactly? One common way of conceptualizing error is in terms of predictions and outcomes If you give the network a partial version of a studied pattern, the network will make a prediction as to the missing features of that pattern (e.g., given something that has “feathers”, the network will guess that it probably flies) Later, you learn what the missing features are (the outcome). If the network’s guess about the missing features is wrong, we want the network to be able to change its weights based on the difference between the prediction and the outcome. Today, I will present the GeneRec error-driven learning rule developed by Randy O’Reilly.

88 Error-Driven Learning slitherslives in Antarctica wad- dles wingsbeakfeathersflies Prediction phase: Present a partial pattern The network makes a guess about the missing features.

89 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles Prediction phase: Present a partial pattern The network makes a guess about the missing features.

90 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles Prediction phase: Present a partial pattern The network makes a guess about the missing features.

91 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles Prediction phase: Present a partial pattern The network makes a guess about the missing features.

92 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles Prediction phase: Present a partial pattern The network makes a guess about the missing features.

93 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles Prediction phase: Present a partial pattern The network makes a guess about the missing features.

94 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wad- dles wingsbeakfeathersflies wad- dles Prediction phase: Present a partial pattern The network makes a guess about the missing features. Outcome phase: Present the full pattern Let the network settle

95 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles Prediction phase: Present a partial pattern The network makes a guess about the missing features. Outcome phase: Present the full pattern Let the network settle

96 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles Prediction phase: Present a partial pattern The network makes a guess about the missing features. Outcome phase: Present the full pattern Let the network settle

97 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles Prediction phase: Present a partial pattern The network makes a guess about the missing features. Outcome phase: Present the full pattern Let the network settle

98 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles We now need to compare these two activity patterns and figure out which weights to change.

99 Motivating the Learning Rule The goal of error-driven learning is to discover an internal representation for the item that activates the correct answer. Basically, we want to find hidden units that are associated with the correct answer (in this case, “waddles”). The best way to do this is to examine how activity changes when “waddles” is clamped on during the “outcome” phase. Hidden units that are associated with “waddles” should show an increase in activity in the outcome (vs. prediction) phase. Hidden units that are not associated with “waddles” should show a decrease in activity in the outcome phase (because of increased competition from other units that are associated with “waddle”).

100 Motivating the Learning Rule Hidden units that are associated with “waddle” should show an increase in activity in the outcome (vs. prediction) phase. Hidden units that are not associated with “waddle” should show a decrease in activity in the outcome phase Here is the learning role: If a hidden unit shows increased activity (i.e., it’s associated with the correct answer), increase its weights to the input pattern If a hidden unit should decreased activity (i.e., it’s not associated with the correct answer), reduce its weights to the input pattern

101 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles

102 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles

103 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles

104 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles

105 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles Hebb and error have opposite effects on weights here! Error increases the extent to which penguin is linked to the right-hand unit, whereas Hebb reinforced penguin’s tendency to activate the left-hand unit

106 Error-Driven Learning slitherslives in Antarctica wad- dles wingsbeakfeathersflies

107 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles

108 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles

109 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles

110 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles

111 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles

112 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles

113 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles

114 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles

115 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles

116 Error-Driven Learning slitherslives in Antarctica wingsbeakfeathersflies wad- dles

117 Catastrophic Interference If you change the weights too strongly in response to “penguin”, then the network starts to behave like all birds waddle. New learning interferes with stored knowledge... The best way to avoid this problem is to make small weight changes, and to interleave “penguin” learning trials with “typical bird” trials The “typical bird” trials serve to remind the network to retain the association between wings/feathers/beak and “flies”...

118 Interleaved Training slitherslives in Antarctica wad- dles wingsbeakfeathersflies

119 Interleaved Training slitherslives in Antarctica wingsbeakfeathersflies wad- dles

120 Interleaved Training slitherslives in Antarctica wingsbeakfeathersflies wad- dles

121 Interleaved Training slitherslives in Antarctica wingsbeakfeathersflies wad- dles

122 Interleaved Training slitherslives in Antarctica wingsbeakfeathersflies wad- dles

123 Interleaved Training slitherslives in Antarctica wingsbeakfeathersflies wad- dles

124 Interleaved Training slitherslives in Antarctica wingsbeakfeathersflies wad- dles

125 Interleaved Training slitherslives in Antarctica wingsbeakfeathersflies wad- dles

126 Interleaved Training slitherslives in Antarctica wingsbeakfeathersflies wad- dles

127 Interleaved Training slitherslives in Antarctica wingsbeakfeathersflies wad- dles

128 Interleaved Training slitherslives in Antarctica wingsbeakfeathersflies wad- dles

129 Gradual vs. One-Trial Learning Problem: It appears that the solution to the catastrophic interference problem is to learn slowly. But we also need to be able to learn quickly!

130 Gradual vs. One-Trial Learning Put another way: There appears to be a trade-off between learning rate and interference in the cortical network Our claim is that the brain avoids this trade-off by having two separate networks: A slow-learning cortical network that gradually develops internal representations that support generalization, prediction, categorization, etc. A fast-learning hippocampal network that is specialized for rapid memorization (but does not support generalization, categorization, etc.)

131 CA3 CA1 Dentate Gyrus Entorhinal Cortex input Entorhinal Cortex output lower-level cortex hippo- campus neo- cortex

132 Interactions Between Hippo and Cortex According to the Complementary Learning Systems theory (McClelland et al., 1995), hippocampus rapidly memorizes patterns of cortical activity. The hippocampus manages to learn rapidly without suffering catastrophic interference because it has a built- in tendency to assign distinct, minimally overlapping representations to input patterns, even when they are very similar. Of course this hurts its ability to categorize.

133 Interactions Between Hippo and Cortex The theory states that, when you are asleep, the hippocampus “plays back” stored patterns in an interleaved fashion, thereby allowing cortex to weave new facts and experiences into existing knowledge structures. Even if something just happens once in the real world, hippocampus can keep re-playing it to cortex, interleaved with other events, until it sinks in... Detailed theory: slow-wave sleep = hippo playback to cortex REM sleep = cortex randomly activates stored representations; this strengthens pre-existing knowledge and protects it against interference

134 Role of the Hippocampus slitherslives in Antarctica waddles wingsbeakfeathers flies hippocampus

135 Role of the Hippocampus slitherslives in Antarctica wingsbeakfeathers flies hippocampus waddles

136 Role of the Hippocampus slitherslives in Antarctica wingsbeakfeathers flies hippocampus waddles

137 Role of the Hippocampus slitherslives in Antarctica wingsbeakfeathers flies hippocampus waddles

138 Role of the Hippocampus slitherslives in Antarctica wingsbeakfeathers flies hippocampus waddles

139 Role of the Hippocampus slitherslives in Antarctica wingsbeakfeathers flies hippocampus waddles

140 Role of the Hippocampus slitherslives in Antarctica wingsbeakfeathers flies hippocampus waddles

141 Role of the Hippocampus slitherslives in Antarctica wingsbeakfeathers flies hippocampus waddles

142 Error-Driven Learning: Summary Error-driven learning algorithms are very powerful: So long as the learning rate is small, and training patterns are presented in an interleaved fashion, algorithms like GeneRec can learn internal representations that support good “pattern completion” of missing features. Error-driven learning is not meant to be a replacement for Hebbian learning: The two algorithms can co-exist! Hebbian learning actually improves the performance of GeneRec by ensuring that hidden units represent meaningful clusters of features

143 Error-Driven Learning: Summary Theoretical issues to resolve with error-driven learning: The algorithm requires that the network “know” whether you are in a “prediction” phase or an “outcome” phase, how does the network know this? For that matter, the whole “phases” idea is sketchy GeneRec based on “prediction/outcome” differences is not the only way to do error-driven learning... Backpropagation Learning by reconstruction Adaptive Resonance Theory (Grossberg & Carpenter)

144 Learning by Reconstruction Instead of doing error-driven learning by comparing predictions and outcomes, you can also do error-driven learning as follows: First, you clamp the correct, full pattern onto the network and let it settle. Then, you erase the input pattern and see whether the network can reconstruct the input pattern based on its internal representation The algorithm is basically the same, you are still comparing two phases...

145 Learning by Reconstruction slitherslives in Antarctica wad- dles wingsbeakfeathersflies Clamp the to- be-learned pattern onto the input and let the network settle

146 slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles Learning by Reconstruction Clamp the to- be-learned pattern onto the input and let the network settle Next, wipe the input layer clean (but not the hidden layer) and let the network settle

147 slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles Learning by Reconstruction Clamp the to- be-learned pattern onto the input and let the network settle Next, wipe the input layer clean (but not the hidden layer) and let the network settle

148 slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles Learning by Reconstruction Clamp the to- be-learned pattern onto the input and let the network settle Next, wipe the input layer clean (but not the hidden layer) and let the network settle

149 slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles Learning by Reconstruction Clamp the to- be-learned pattern onto the input and let the network settle Next, wipe the input layer clean (but not the hidden layer) and let the network settle

150 slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles Learning by Reconstruction Compare hidden activity in the two phases and adjust weights accordingly (i.e., if activation was higher with the correct answer clamped, increase weights; if activation was lower, decrease wts)

151 slitherslives in Antarctica wingsbeakfeathersflies slitherslives in Antarctica wingsbeakfeathersflies wad- dles wad- dles Learning by Reconstruction Compare hidden activity in the two phases and adjust weights accordingly (i.e., if activation was higher with the correct answer clamped, increase weights; if activation was lower, decrease wts)

152 Adaptive Resonance Theory slitherslives in Antarctica waddles wingsbeakfeathers flies

153 Adaptive Resonance Theory slitherslives in Antarctica wingsbeakfeathers flies waddles

154 Adaptive Resonance Theory slitherslives in Antarctica wingsbeakfeathers flies waddles

155 Adaptive Resonance Theory slitherslives in Antarctica wingsbeakfeathers flies waddles

156 Adaptive Resonance Theory slitherslives in Antarctica wingsbeakfeathers flies waddles

157 Adaptive Resonance Theory slitherslives in Antarctica wingsbeakfeathers flies MISMATCH! waddles

158 Adaptive Resonance Theory slitherslives in Antarctica wingsbeakfeathers flies MISMATCH! waddles

159 Adaptive Resonance Theory slitherslives in Antarctica wingsbeakfeathers flies MISMATCH! waddles

160 Adaptive Resonance Theory slitherslives in Antarctica wingsbeakfeathers flies MISMATCH! waddles

161 Spreading Activation vs. Active Maintenance Spreading activation is generally very useful... it lets us make predictions/inferences/etc. But sometimes you just want to hold on to a pattern of activation without letting activation spread (e.g., a phone number, or a person’s name). How do we maintain specific patterns of activity in the face of distraction?

162 Spreading Activation vs. Active Maintenance As you will see in the “hands-on” part of the workshop, the networks we have been discussing are not very robust to noise/distraction. Thus, there appears to be another tradeoff: Networks that are good at generalization/prediction are lousy at holding on to phone numbers/plans/ideas in the face of distraction

163 Spreading Activation vs. Active Maintenance Solution: We have evolved a network that is optimized for active maintenance: Prefrontal cortex! This complements the rest of cortex, which is good at generalization but not so good at active maintenance. PFC uses isolated representations to prevent spread of activity... Evidence for isolated stripes in PFC

164 Tripartite Functional Organization PC = posterior perceptual & motor cortex FC = prefrontal cortex HC = hippocampus and related structures

165 Tripartite Functional Organization PC = incremental learning about the structure of the environment FC = active maintenance, cognitive control HC = rapid memorization Roles are defined by functional tradeoffs…

166 Key Trade-offs Extracting what is generally true (across events) vs. memorizing specific events Inference (spreading activation) vs. robust active maintenance

167 Hands-On Exercises The goal of the hands-on part of the workshop is to get a feel for the kinds of representations that are acquired by Hebbian vs. error-driven learning, and for network dynamics more generally.

168 Here is the network that we will be using: Activity constraints: Only 10% of hidden units can be strongly active at once; in the input layer, only one unit per row Think of each row in the input as a feature dimension (e.g., shape) and the units in that row are mutually exclusive features along that dimension (square, circle, etc.)

169 This diagram illustrates the connectivity of the network: Each hidden unit is connected to 50% of the input units; there are also recurrent connections from each hidden unit to all of the other hidden units Weights are symmetric Initial weight values were set randomly

170 I trained up the network on the following 8 patterns: In each pattern, the bottom 16 rows encode prototypical features that tend to be shared across patterns within a category; the top 8 rows encode item-specific features that are unique to each pattern. Each category has 3 “typical” items and one “atypical” item During training, the network studied typical patterns 90% of the time and it studied atypical patterns 10% of the time

171 To save time, the networks you will be using have been pre-trained on the 8 patterns (by presenting them repeatedly, in an interleaved fashion) For some of the simulations, you will be using a network that was trained with (purely) Hebbian learning

172 For other simulations, you will be using a network that was trained with a combination of error-driven (GeneRec) and Hebbian learning. Training of this network use a three- phase design: First, there was a “prediction” (minus) phase where a partial pattern was presented Second, there was an “outcome” (plus) phase where the full version of the pattern was presented Finally, there was a nothing phase where the input pattern was erased (but not the hidden pattern) Error-driven learning occurred based on the difference in activity between the minus and plus patterns, and based on the differenced in activity between the plus and nothing patterns

173 When you get to the computer room, the simulation should already be open on the computer (some of you may have to double-up, I think there are slightly fewer computers than students) and there will be a handout on the desk explaining what to do You can proceed at your own pace I will be there to answer questions (about the lecture and about the computer exercises) and my two grad students Ehren Newman and Sean Polyn will also be there to answer questions.

174 Your Helpers Ehren Sean me


Download ppt "Secrets of Neural Network Models Ken Norman Princeton University July 24, 2003 Note: These slides have been provided online for the convenience of students."

Similar presentations


Ads by Google