Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Backprop to Understand Apects of Cognitive Development PDP Class Feb 8, 2010.

Similar presentations


Presentation on theme: "Using Backprop to Understand Apects of Cognitive Development PDP Class Feb 8, 2010."— Presentation transcript:

1 Using Backprop to Understand Apects of Cognitive Development PDP Class Feb 8, 2010

2 Back propagation algorithm Propagate activation forward Propagate “error” backward Calculate ‘weight error derivative’ terms =  r a s Change weights after –Each pattern –A batch of patterns i j k At the output level:  i = (t i -a i )f’(net i ) At other levels:  j = f’(net j )  i  i w ij, etc.

3 Variants/Embellishments to back propagation We can include weight decay and momentum:  w rs =   p  rp a sp –  w rs +  w rs (prev) An alternative error measure has both conceptual and practical advantages: CE p = -  i [t ip log(a ip ) + (1-t ip )log(1-a ip )] If targets are actually probabilistic, minimizing CE p causes activations to match the probability of the observed target values. This also eliminates the ‘pinned output unit’ problem.

4 Is backprop biologically plausible? Neurons do net send error signals backward across their weights through a chain of neurons, as far as anyone can tell. But we shouldn’t be too literal minded about the actual biological implementation of the learning rule. Some neurons appear to use error signals, and there are ways to use differences between activation signals to carry error information. (We will explore this in a later lecture.)

5 Why is back propagation important? Provides a procedure that allows networks to learn weights that can solve any deterministic input-output problem. –Contrary to expectation, it does not get stuck in local minima except in cases where the network is exceptionally tightly constrained. –Allows networks with multiple hidden layers to be trained, although learning tends to proceed slowly (later we will learn about procedures that can fix this). Allows networks to learn how to represent information as well as how to use it. Raises questions about the nature of representations and of what must be specified in order to learn them.

6 The Time-Course of Cognitive Development Networks trained with back-propagation address several issues in development including –Whether innate knowledge is necessary as a starting point for learning. –Aspects of the time course of development –What causes changes in the pattern of responses children make at different times during development? –What allows a learned to reach a the point of being ready to learn something s/he previously was not ready to learn?

7 Two Example Models Rumelhart’s semantic learning model –Addresses most of the issues above –Available as the “semnet” script in the bp directory Model of child development in a ‘naïve physics’ task (Piaget’s balance scale task) –Addresses stage transitions and readiness to learn new things –We will not get to this; see readings of interested

8 Quillian’s (1969) Hierarchical Propositional Model

9 The Rumelhart (1990) Model

10 The Training Data: All propositions true of items at the bottom level of the tree, e.g.: Robin can {fly, move, grow}

11 The Rumelhart Model: Target output for ‘robin can’ input

12 The Rumelhart Model

13

14 ExperienceExperience Early Later Later Still

15 Inference and Generalization in the PDP Model A semantic representation for a new item can be derived by error propagation from given information, using knowledge already stored in the weights.

16 Start with a neutral representation on the representation units. Use backprop to adjust the representation to minimize the error.

17 The result is a representation similar to that of the average bird…

18 Use the representation to infer what a this new thing can do.

19 Some Phenomena in Conceptual Development Progressive differentiation of concepts Illusory correlations and U-shaped developmental trajectories Domain- and property-specific constraints on generalization Reorganization of Conceptual Knoweldge

20

21

22 Waves of differentiation reflect sensitivity to patterns of coherent covariation of properties across items. Patterns of coherent covariation are reflected in the principal components of the property covariance matrix. Figure shows attribute loadings on the first three principal components: –1. Plants vs. animals –2. Birds vs. fish –3. Trees vs. flowers Same color = features covary in component Diff color = anti-covarying features What Drives Progressive Differentiation?

23 Coherent Covariation The tendency of properties of objects to co- occur in clusters. e.g. –Has wings –Can fly –Is light Or –Has roots –Has rigid cell walls –Can grow tall

24 Coherence Training Patterns No labels are provided Each item and each property occurs with equal frequency Properties Coherent Incoherent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Items is can has is can has …

25 Effects of Coherence on Learning Coherent Properties Incoherent Properties

26 Effect of Coherence on Representation

27 Effects of Coherent Variation on Learning in Connectionist Models Attributes that vary together create the acquired concepts that populate the taxonomic hierarchy, and determine which properties are central and which are incidental to a given concept. –Labeling of these concepts or their properties is in no way necessary, but it may contribute additional ‘covarying’ information, and can affect the pattern of differentiation. Arbitrary properties (those that do not co-vary with others) are very difficult to learn. –And it is harder to learn names for concepts that are only differentiated by such arbitrary properties.

28 Sensitivity to Coherence Requires Convergence A A A

29 Illusory Correlations Rochel Gelman found that children think that all animals have feet. –Even animals that look like small furry balls and don’t seem to have any feet at all.

30 A typical property that a particular object lacks e.g., pine has leaves An infrequent, atypical property

31 Domain Specificity What constraints are required for development and elaboration of domain-specific knowledge? –Are domain specific constraints required? –Or are there general principles that allow for acquisition of conceptual knowledge of all different types?

32 Differential Importance (Marcario, 1991) 3-4 yr old children see a puppet and are told he likes to eat, or play with, a certain object (e.g., top object at right) –Children then must choose another one that will “be the same kind of thing to eat” or that will be “the same kind of thing to play with”. –In the first case they tend to choose the object with the same color. –In the second case they will tend to choose the object with the same shape.

33 –Can the knowledge that one kind of property is important for one type of thing while another is important for a different type of thing be learned? –They can in the PDP model, since it is sensitive to domain-specific patterns of coherent covariation.

34 Adjustments to Training Environment Among the plants: –All trees are large –All flowers are small –Either can be bright or dull Among the animals: –All birds are bright –All fish are dull –Either can be small or large In other words: –Size covaries with properties that differentiate different types of plants –Brightness covaries with properties that differentiate different types of animals

35 Testing Feature Importance After partial learning, model is shown eight test objects: –Four “Animals”: All have skin One is large, bright; one small, bright; one large, dull, one small, dull. –Four “Plants”: All have roots Same 4 combinations as above Representations are generated by using back-propagation to representation. Representations are then compared to see which ‘animals’ are treated as most similar, and which ‘plants’ are treated as most similar.

36 The Rumelhart Model

37 Similarities of Obtained Representations Size is relevant for Plants Brightness is relevant for Animals

38 Additional Properties of the model The model is sensitive to amount and type of exposure, addressing frequency effects, expertise effects and capturing different types of expertise. The model’s pattern of generalization varies as a function of the type of property as well as the domain. The model can reorganize its knowledge: –It will first learn about superficial appearance properties if these are generally available; later, it can re-organize its knowledge based on coherent covariation among properties that are only occur in specific context.


Download ppt "Using Backprop to Understand Apects of Cognitive Development PDP Class Feb 8, 2010."

Similar presentations


Ads by Google