Presentation is loading. Please wait.

Presentation is loading. Please wait.

Connectionist Models of Language Development: Grammar and the Lexicon Steve R. Howell McMaster University, 1999.

Similar presentations


Presentation on theme: "Connectionist Models of Language Development: Grammar and the Lexicon Steve R. Howell McMaster University, 1999."— Presentation transcript:

1 Connectionist Models of Language Development: Grammar and the Lexicon Steve R. Howell McMaster University, 1999

2 Overview Description of Research Plan Explanation of Research Goals Examine Inspiration for this research, in both connectionist and language sub-fields Methods Results (Preliminary) Discussion & Future Directions

3 Overall Research Plan Pursuit of an Integrated, multi-level, connectionist model of language development “Multi-level” = dealing with several different levels or parts of the language task “Integrated” = non-modular; homogenous functioning throughout the multi-level design

4 Research Goals Better understanding of language development process Ability to test different interventions on a successful model, instead of children, including possibly lesioning the model Functional language-learning model for AI and Software (e.g. “Chatterbots” on net)

5 Connectionist Inspiration Work of Jeff Elman on models of grammar learning using Simple Recurrent Networks (SRN’s) Work of Landauer et. al. on acquisition of semantic information (i.e. the lexicon) through analysis of many weak word to word relations in real-world text

6 Language-Domain Inspiration Evidence against a sharp divide in the acquisition of the lexicon and grammar (e.g. Bates) Lexicon develops first, but grammar development overlaps, in relation to it, and seemingly in step with it) Hence the present focus on homogenous mechanisms to explain the two.

7 Method Computer Simulation of Connectionist (Neural Network) model Base algorithm and structure is Elman’s (1990) Simple Recurrent Network Modifications include sub-word-level input, multi-level architecture, and automated localist to distributed representation conversion

8 Diagram of SRN

9 Parts of an Elman SRN Input Layer of Units Larger (usually) Hidden Layer of Units Context Layer ‘memory’ connected to hidden layer Output Layer of units, same size as input layer Uses back-propagation learning algorithm Uses Prediction task to provide more plausible teaching signal Recurrent Context Units take copy of hidden units at each time step

10 Modifications: Sub-word Input Triples (Mozer, Wicklegren), or artificial phonemes Recently completed simulation demonstrating superiority of triples or phoneme-level word representations to whole-word localist representations for grammar learning (phonics?)

11 Representations of Words Localist [0 0 0 0 0 0 0 1] [0 0 0 1 0 0 0 0] Binary Distributed [0 1 0 0 1 0 1 1] [1 0 1 1 1 1 1 1] Fully Distributed [0.43 0.23 0.03 0.1 0.04] [0.22 0.12 0.04 0.42 0.5] Elman(1990) - Localist Triples - Binary distrib. Semantic Encoding - Fully Distributed

12 Route to Multi-level Architecture Elman SRN showed how word co- occurrence information could be used to learn word relationships (simple grammar) Learning was of previous words (context) to next word predicted Even with a sub-word distributed representation, prediction is still of the next word

13 Elman (1990) Clustering Results

14 Sub-word prediction If we use a ‘sliding window’ on the input text (e.g. five letters for three letter triples) then we are predicting the next triple from the previous triples; true sub-word prediction e.g. The dog chased the cat... Time 1 - “The_d” = The, he_, e_d, _d Time 2 - “he_do’’ = he_, e_d, _do

15 Sub-word Advantages Richer representations, accessing more of the data inherent in the text or speech stream Makes prediction/internal representation easier Eliminates need for artificial pre-processing of text into word vectors, just automatically translates letters into triple vectors.

16 Sub-word Disadvantages Cannot output words easily, just have a collection of triples Must stack a “clean-up” net on top in order to reach word representations from the existing triple representations Hence, the multi-layer approach: combine prediction at two time-scales and levels of granularity, but using the same method

17 Multi-layer SRN Diagram

18 Multi-layer SRN Triples or letters layer: Input Layer 1 Hidden Layer 1 Context Layer 1 Output Layer 1 Learns to predict triples/phonemes Word Layer: Input Layer 2 = Hidden Layer 1 Hidden Layer 2 Context Layer 2 Output Layer 2 Predicts words from triples/phonemes

19


Download ppt "Connectionist Models of Language Development: Grammar and the Lexicon Steve R. Howell McMaster University, 1999."

Similar presentations


Ads by Google