Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit

Multi-layer networks For many modelling problems, multi- layer networks are used Three layers are common: Input layer Hidden layer Output layer What do the hidden-node activations correspond to? Internal representation For some problems, networks need to compute an “intermediate” representation of the data

XOR network - step 1 XOR is the same as OR but not AND Calculate OR Calculate NOT AND AND the results NOT ANDOR AND

XOR network - step 2 OUTPUT BIAS NODE HIDDEN 1HIDDEN 2 INPUT 1INPUT 2 10 -7.5 -5 7.5 5 5 -7.5 NOT ANDOR AND

Simple example (Smith 2003) Smith wanted to model a simple language-using population Needed a model that learned vocabulary 3 “meanings” (1 0 0), (0 1 0), (0 0 1) 6 possible signals (0 0 0), (1 0 0), (1 1 0) … Used networks for reception and production: MEANINGSIGNAL MEANING After training, knowledge of language stored in the weights During reception/production, internal representation is in the activations of the hidden nodes Perform Train

Can a network learn syntax? (Elman 1993) Important question for the evolution of language: Modelling can tell us what we can do without Can we model the acquisition of syntax using a neural network? One problem… sentences can be arbitrarily long How much knowledge of grammar are we born with?

Representing time Imagine we presented words one at a time to a network Would it matter what order the words were give? No: Each word is a brand new experience The net has no way of relating each experience with what has gone before Needs some kind of working memory Intuitively: each word needs to be presented along with what the network was thinking about when it heard the previous word

The Simple Recurrent Net (SRN) At each time step, the input is: a new experience plus a copy of the hidden unit activations at the last time step Copy back connections Input Output Hidden Context

What inputs and outputs? How do we force the network to learning syntactic relations? Can we do it without an external “teacher”? Answer: the next-word prediction task Inputs: Current word (and context) Outputs: Predicted next word The error signal is implicit in the data

Long distance dependencies and hierarchy Elman’s question: how much is innate? Many argue: Long-distances dependencies and hierarchical embedding are “unlearnable” without innate language faculty How well can an SRN learn them? Examples: 1.boys who chase dogs see girls 2.cats chase dogs 3.dogs see boys who cats who mary feeds chase 4.mary walks

First experiments Each word encoded as a single unit “on” in the input.

Initial results How can we tell if the net has learned syntax? Check whether it predicts the correct number agreement Gets some things right, but makes many mistakes Seems not to have learned long-distance dependency. boys who girl chase see dog

Incremental input Elman tried teaching the network in stages Five stages: 1.10,000 simple sentences (x 5) 2.7,500 simple + 2,500 complex (x 5) 3.5,000 simple + 5,000 complex (x 5) 4.2,500 simple + 7,500 complex (x 5) 5.10,000 complex sentences (x 5) Surprisingly, this training regime lead to success!

Is this realistic? Elman reasons that this is in some ways like children’s behaviour Children seem to learn to produce simple sentences first Is this a reasonable suggestion? Where is the incremental input coming from? Developmental schedule appears to be a product of changing the input.

Another route to incremental learning Rather than the experimenter selecting simple, then complex sentences, could the network? Children’s data isn’t changing… children are changing Elman gets the network to change throughout its “life” What is a reasonable way for the network to change? One possibility: memory

Reducing the attention span of a network Destroy memory by setting context nodes to 0.5 Five stages of learning (with both simple and complex sentences): 1.Memory blanked every 3-4 words (x 12) 2.Memory blanked every 4-5 words (x 5) 3.Memory blanked every 5-6 words (x 5) 4.Memory blanked every 6-7 words (x 5) 5.No memory limitations (x 5) The network learned the task.

Counter-intuitive conclusion: starting small A fully-functioning network cannot learn syntax. A network that is initially limited (but matures) learns well. This seems a strange result, suggesting that networks aren’t good models of language learning after all On the other hand… Children mature during learning Infancy in humans is prolonged relative to other species Ultimate language ability seems to be related to how early learning starts i.e., there is a critical period for language acquisition.

Next lecture We’ve seen how we can model aspects of language learning in simulations What about evolution? Cultural evolution Individual learning Biological evolution

Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Similar presentations

Presentation on theme: "Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Similar presentations

Presentation on theme: "Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit."— Presentation transcript:

Similar presentations

About project

Feedback