Presentation on theme: "Distributed Representation, Connection-Based Learning, and Memory Psychology 209 February 1, 2013."— Presentation transcript:
Distributed Representation, Connection-Based Learning, and Memory Psychology 209 February 1, 2013
The Concept of a Distributed Representation Instead of assuming that an object (concept, etc) is represented in the mind by a single unit, we consider the possibility that it could be represented by patterns of activation over populations of units. The elements of the pattern may represent (approximately) some feature or sensible combination of features but they need not. What is crucial is that no units are dedicated to a single object; in general all units participate in the representation of many different objects. Neurons in the monkey visual cortex appear to exemplify these properties. Note that neurons in some parts of the brain are more selective than others but (in most people’s view) this is just a matter of degree.
Stimuli used by Baylis, Rolls, and Leonard (1991)
Responses of Four Neurons to Face and Non-Face Stimuli in Previous Slide
Responses to various stimuli by a neuron responding to a Tabby Cat (Tanaka et al, 1991)
A ‘Sydney Opera House’ Neuron Figures on this and previous two slides from: Quiroga, Q. et al, 2005, Nature, 435, 1102-1107.
Computational Arguments for the use of Distributed Representations (Hinton et al, 1986) They use the units in a network more efficiently They support generalization on the basis of similarity They can support micro-inferences based on consistent relationships between participating units –E.g. units activated my male facial features would activate units associated with lower-pitched voices. Overlap increases generalization and micro-inferences; less overlap reduces it. There appears to be less overlap in the hippocampus than in other cortical areas – an issue to which we will return in a later lecture.
What is a Memory? The trace left in the memory system by an experience? A representation brought back to mind of a prior event or experience? Note that in some theories, these things are assumed to be one and the same (although there may be some decay or corruption).
Further questions Do we store separate representations of items and categories? –Experiments suggest participants are sensitive to item information and also to the category prototype. Exemplar models store traces of each item encountered. But what is an item? Do items ever repeat? Is it Exemplars all the way down?
A PDP Approach to Memory An experience is a pattern of activation over neurons in one or more brain regions. The trace left in memory is the set of adjustments to the strengths of the connections. –Each experience leaves such a trace, but the traces are not separable or distinct. –Rather, they are superimposed in the same set of connection weights. Recall involves the recreation of a pattern of activation, using a part or associate of it as a cue. Every act of recall is always an act of reconstruction, shaped by the traces of many other experiences.
The Hopfield Network A memory is a random pattern of 1’s and (effectivey) -1’s over the units in a network like the one shown here (there are no self- connections). To learn, a pattern is clamped on the units; weights are learned using the Hebb rule. A set of patterns can be stored in this way. The network is probed by setting the states of the units in an initial state, and then updating the units asynchronously (as in the cube example) until the activations stop changing, using a step function. Input is removed during settling. The result is the retrieved memory. –Noisy or incomplete patterns can be cleaned up or completed. Network itself makes decisions; “no complex external machinery is required” If many memories are stored, there is cross- talk among them. –If random vectors are used, capacity is only about.14*N, N being the number of units.
The McClelland/Rumelhart (1985) Distributed Memory Model Inspired by ‘Brain-State in a Box’ model of James Anderson, which predates the Hopfield net. Uses continuous units with activations between -1 and 1. Uses the same activation function as the iac model without a threshold. Net input is the sum of external plus internal inputs: net i = e i + i i Learning occurs according to the ‘Delta Rule’: i = e i – i i w ij += i a j Short vs long-lasting changes to weights: –As a first approximation to this, weight increments are thought to decay rapidly from initial values to smaller more permanent values.
Basic properties of auto-associator models They can learn multiple ‘memories’ in the same set of weights –Recall: pattern completion –Recognition: strength of pattern activation –Facilitation of processing: how quickly and strongly settling occurs. With the Hebb Rule: –Orthogonal patterns can be stored without mutual contamination (up to n, but the memory ‘whites out’) With the Delta Rule: –Sets of non-orthogonal patterns can be learned, and some of the cross-talk can be eliminated with exposure. –However, over the entire set of patterns, the external input to each unit must be a linear combination of the activations of all of the other units.
Issues addressed by the M&R Distributed Memory Model Memory for general and specific information –Learning a prototype –Learning multiple prototypes in the same network –Learning general as well as specific information
Weights after learning from distortions of a prototype (each with a different ‘name’)
Sending Units Receiving Units Weights after learning Dog, Cat, and Bagel Patterns
Performance after training with several ‘dogs’, plus ‘Rover’ and ‘Fido’
Whittlesea (1983) Examined the effect of general and specific information on identification of letter strings after exposure to varying numbers and degrees of distortions to particular prototype strings.
Whittlesea’s Experiments Each experiment involved different numbers of distortions presented different numbers of times during training. Each test involved other distortions; W never tested the prototype but I did in some of my simulations. Performance measures are per-letter increase in identification compared to base line (E) and increase in dot product of input with activation due to learning (S).
Probability of choosing correct response modeled as:
Example stimuli Spared vs. impaired category vs. item learning in amnesia? This happens in the model if we simply assume amnesia reflects a smaller value of the learning rate parameter (Amnesia is a bit more interesting than this – see later lecture).
Limitations of Auto-Associator Models Capacity is limited –Different variants have different capacities –The sparser the patterns, the larger the number that can be learned Sets of patterns violating linear predictability constraint cannot be learned perfectly. Does not capture effects indicative of representational and behavioral sharpening –‘Strength Mirror Effect’ –Sharpening of neural representations after repetition We will return to these issues in a later lecture, after we have a procedure in hand for training connections into hidden units.