Sentence Processing using a Simple Recurrent Network EE 645 Final Project Spring 2003 Dong-Wan Kang 5/14/2003.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Introduction to Neural Networks Computing
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Learning linguistic structure with simple recurrent networks February 20, 2013.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Connectionist Simulation of the Empirical Acquisition of Grammatical Relations – William C. Morris, Jeffrey Elman Connectionist Simulation of the Empirical.
Machine Learning Neural Networks
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Recurrent Neural Networks
9.012 Brain and Cognitive Sciences II Part VIII: Intro to Language & Psycholinguistics - Dr. Ted Gibson.
Radial Basis Functions
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Chapter Seven The Network Approach: Mind as a Web.
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
1 Introduction to Artificial Neural Networks Andrew L. Nelson Visiting Research Faculty University of South Florida.
Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
James L. McClelland Stanford University
Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.
February 22, 2010 Connectionist Models of Language.
Classification / Regression Neural Networks 2
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Connectionist Models of Language Development: Grammar and the Lexicon Steve R. Howell McMaster University, 1999.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Multi-Layer Perceptron
Introduction to Neural Networks and Example Applications in HCI Nick Gentile.
Linear Classification with Perceptrons
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Chapter 18 Connectionist Models
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Chapter 6 Neural Network.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
Feasibility of Using Machine Learning Algorithms to Determine Future Price Points of Stocks By: Alexander Dumont.
Neural Networks: An Introduction and Overview
Machine Learning 12. Local Models.
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
RNNs: An example applied to the prediction task
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Simple recurrent networks.
James L. McClelland SS 100, May 31, 2011
Backpropagation in fully recurrent and continuous networks
Classification / Regression Neural Networks 2
CSE 473 Introduction to Artificial Intelligence Neural Networks
RNNs: Going Beyond the SRN in Language Prediction
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.
Emre O. Neftci  iScience  Volume 5, Pages (July 2018) DOI: /j.isci
Financial Data Modelling
Learning linguistic structure with simple recurrent neural networks
RNNs: Going Beyond the SRN in Language Prediction
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab
The Network Approach: Mind as a Web
Neural Networks: An Introduction and Overview
Presentation transcript:

Sentence Processing using a Simple Recurrent Network EE 645 Final Project Spring 2003 Dong-Wan Kang 5/14/2003

Contents 1.Introduction - Motivations 2.Previous & Related Works a) McClelland & Kawamoto(1986) b) Elman (1990, 1993, & 1999) c) Miikkulainen (1996) 3.Algorithms (Williams and Zipser, 1989) - Real Time Recurrent Learning 4.Simulations 5.Data & Encoding schemes 6.Results 7.Discussion & Future work

Motivations Can the neural network recognize the lexical classes from the sentence and learn the various types of sentences? From cognitive science perspective: - comparison between human language learning and neural network learning pattern - (e.g.) Learning English past tense (Rumelhart & McClelland,1986), Grammaticality judgment (Allen & Seidenberg,1999), Embedded sentences(Elman 1993, Miikkulainen1996, etc.)

Related Works McClelland & Kawamoto (1986) - Sentences with Case Role Assignments and semantic features by using backpropagation algorithm - output: 2500 case role units for each sentence - (e.g.) input: the boy hit the wall with the ball. output: [ Agent Verb Patient Instrument ] + [other features] - Limitation: poses a hard limit on the number of input size. - Alternative: Instead of detecting the input patterns displaced in space, detect the patterns which were in time (sequential inputs).

Related Works (continued) Elman (1990,1993, & 1999) - Simple Recurrent Network: Partially Recurrent Network using Context units - Network with a dynamic memory - Context units at time t hold a copy of the activations of the hidden units from the previous time step at time t-1. - Network can recognize sequences. input: Many years ago boy and girl … | | | | | | output: years ago boy and girl …

Related Works (continued) Miikkulainen (1996) - SPEC architecture (Subsymbolic Parser for Embedded Clauses  Recurrent Network) - Parser, Segmenter, and Stack: process the center and tail embedded sentences: 98,100 sentences with 49 different sentence by using case role assignments - (e.g.) Sequential Inputs input: …, the girl, who, liked, the dog, saw, the boy, … output: …, [the girl, saw, the boy] [the girl, liked, the dog] case role: (agent, act, patient) (agent, act, patient)

Algorithms Recurrent Network - Unlike feedforward networks, they allow connections both ways between a pair of units and even from a unit to itself. - Backpropagation through time (BPTT) – unfolds the temporal operation of the network into a layered feedforward network at every time step. (Rumelhart, et al., 1986) - Real Time Recurrent Learning (RTRL) – two versions (Williams and Zipser, 1989) 1) update weights after processing sequences is completed. 2) on-line: update weights while sequences are being presented. - Simple Recurrent Network (SRN) – partially recurrent network in terms of time and space. It has context units which store the outputs of the hidden units (Elman, 1990). (It can be modified from RTRL algorithm.)

Real Time Recurrent Learning Williams and Zipser (1989) - This algorithm computes the derivatives of states and outputs with respect to all weights as the network processes the sequence. Summary of Algorithm: In recurrent network, for any unit connected to any other and the input at node i at time t, the dynamic update rule is:

RTRL (continued) Error measure: with target outputs defined for some k ’s and t ’s if is defined at time t; otherwise Total cost function, t =0,1, …, T, where

RTRL (continued) The gradient of E separate in time, to do gradient descent, we define: The derivative of update rule: where initial condition t = 0,

RTRL (continued) Depending on the way of updating weights, there can be two versions of RTRL. 1) Update the weights after the sequences are completed at (t = T ). 2) Update the weights after each time step: on-line Elman’s “tlearn” simulator program for “Simple Recurrent Network” (which I’m using for this project) is implemented based on the classical backpropagation algorithm and the modification of this RTRL algorithm.

Simulation Based on Elman’s data and Simple Recurrent Network(1990,1993, & 1999), simple sentences and embedded sentences are simulated by using “tlearn” neural network program (BP + modified version of RTLR algorithm) available at Question: 1. Can the network discover the lexical classes from word order? 2. Can the network recognize the relative pronouns and predict them?

Network Architecture 31 input nodes 31 output nodes 150 hidden nodes 150 context nodes * black arrow: distributed and learnable * dotted blue arrow: linear function and one-to-one connection with hidden nodes

Training Data Lexicon (31 words) NOUN-HUM man woman boy girl NOUN-ANIM cat mouse dog lion NOUN-INANIM book rock car NOUN-AGRESS dragon monster NOUN-FRAG glass plate NOUN-FOOD cookie bread sandwich VERB-INTRAN think sleep exist VERB-TRAN see chase like VERB-AGPAT move break VERB-PERCEPT smell see VERB-DESTROY break smash VERB-EAT eat RELAT-HUM who RELAT-INHUM which Grammar (16 templates) NOUN-HUM VERB-EAT NOUN-FOOD NOUN-HUM VERB-PERCEPT NOUN-INANIM NOUN-HUM VERB-DESTROY NOUN-FRAG NOUN-HUM VERB-INTRAN NOUN-HUM VERB-TRAN NOUN-HUM NOUN-HUM VERB-AGPAT NOUN-INANIM NOUN-HUM VERB-AGPAT NOUN-ANIM VERB-EAT NOUN-FOOD NOUN-ANIM VERB-TRAN NOUN-ANIM NOUN-ANIM VERB-AGPAT NOUN-INANIM NOUN-ANIM VERB-AGPAT NOUN-INANIM VERB-AGPAT NOUN-AGRESS VERB-DESTROY NOUN-FRAG NOUN-AGRESS VERB-EAT NOUN-HUM NOUN-AGRESS VERB-EAT NOUN-ANIM NOUN-AGRESS VERB-EAT NOUN-FOOD

Sample Sentences & Mapping Simple sentences – 2 types - man think (2 words) - girl see dog (3 words) - man break glass (3 words) Embedded sentences - 3 types (*RP – Relative Pronoun) 1. monster eat man who sleep (RP–sub, VERB-INTRAN) 2. dog see man who eat sandwich (RP-sub, VERB-TRAN) 3. woman eat cookie which cat chase (RP-obj, VERB-TRAN) Input-Output Mapping: (predict next input – sequential input) INPUT: girl see dog man break glass cat … | | | | | | | OUTPUT: see dog man break glass cat …

Encoding scheme Random word representation - 31-bit vector for each lexical item, each lexical item is represented by a randomly-assigned different bit. - not semantic feature encoding sleep dog woman …

Training a network Incremental Input (Elman, 1993) “Starting small” strategy Phase I: simple sentences (Elman, 1990, used 10,000 sentences) - 1,564 sentences generated(4, bit vectors) - train all patterns: learning rate =0.1, 23 epochs Phase II: embedded sentences (Elman, 1993, 7,500 sentences) - 5,976 sentences generated(35, bit vectors) - loaded with weights from phase I - train (1, ,976) sentences together: learning rate = 0.1, 4 epochs

Performance Network performance was measured by Root Mean Squared Error: the number of input patterns RMS error = target output vector actual output vector Phase I: After 23 epochs, RMS ≈ 0.91 Phase II: After 4 epochs, RMS ≈ 0.84 Why can RMS not be lowered? The prediction task is nondeterministic, so the network cannot produce the unique output for the corresponding input. For this simulation, RMS is NOT the best measurement of performance. Elman’s simulation: RMS = 0.88 (1990), Mean Cosine = (1993)

Phase I: RMS ≈ 0.91 after 23 epochs

Phase II: RMS ≈ 0.84 after 4 epochs

Results and Analysis Output Target … which (target: which ) ? (target: lion ) ? (target: see ) ? (target: boy )  ? (target: move ) ? (target: sandwich ) which (target: which ) ? (target: cat ) ? (target: see ) … ? (target: book ) which (target: which ) ? (target: man ) see (target: see ) … ? (target: dog )  ? (target: chase ) ? (target: man ) ? (target: who ) ? (target: smash ) ? (target: glass )... Arrow(  ) indicates the start of the sentence. In all positions, the word “which” is predicted correctly! But most of words are not predicted including the word “who” is not. Why?  Training Data Since the prediction task is non- deterministic, predicting the exact next word can not be the best performance measurement. We need to look at hidden unit activations of each input, since they reflect what the network has learned about classes of inputs with regard to what they predict.  Cluster Analysis, PCA

Cluster Analysis The network successfully recognizes VERB, NOUN, and some of their subcategories. WHO and WHICH has different distance VERB-INTRAN failed to fit in VERB <Hierarchical cluster diagram of hidden unit activation vectors>

Discussion & Conclusion 1. The network can discover the lexical classes from word order. Noun and Verb are different classes except the “VERB-INTRAN”. Also, subclasses for NOUN are classified correctly, but some subclasses for VERB are mixed. This is related to the input example. 2. The network can recognize and predict the relative pronoun, “which”, but not “who” Why? Because the sentences for “who” is not “RP- obj”, so “who” is just considered as one of normal subject in simple sentences. 3. The organization of input data is important and sensitive to the recurrent network, since it processes the input sequentially and on- line. 4. Generally, recurrent networks by using RTRL recognized sequential input, but it requires more training time and computation resources.

Future Studies Recurrent Least Squared Support Vector Machines, Suykens, J.A.K. & Vandewalle, J., (2000). - provides new perspectives for time-series prediction and nonlinear modeling - seems more efficient than BPTT, RTRL & SRN

References Allen, J., & Seidenberg, M. (1999). The emergence of grammaticality in connectionist networks. In Brian MacWhinney (Ed.), The emergence of language (pp ). Hillsdale, NJ: Lawrence Erlbaum. Elman, J.L. (1990). Finding structure in time. Cognitive Science, 14, Elman, J. (1993). Learning and development in neural networks: the importance of starting small. Cognition, 48, Elman, J.L. (1999). The emergence of language: A conspiracy theory. In B. MacWhinney (Ed.) Emergence of Language. Hillsdale, NJ: Lawrence Earlbaum Associates. Hertz, J., Krogh, A., & Palmer, R.G. (1991). Introduction to the Theory of Neural Computation. Redwood City, CA: Addison-Wesley. MacClelland, J. L., & Kawamoto A. H. (1986). Mechanisms of sentence processing: Assigning roles to constituents of sentences. ( ). In J. L. McClelland & D. E. Rumelhart (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, MA: MIT Press. Miikkulaninen R. (1996). Subsymbolic case-role analysis of sentences with embedded clauses, Cognitive Science, 20, Rummelhart, D., Hinton, G. E., and Williams, R. (1986). "Learning Internal Representations by Error Propagation," Parallel and Distributed Processing: Exploration in the Microstructure of Cognition, Vol. 1, D. Rumelhart and J. McClelland (Eds.), MIT Press, Cambridge, Massachusetts, Rumelhart, D.E., & McClelland, J.L. (1986). On learning the past tense of English verbs. In J.L. McClelland & D.E. Rumelhart (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, MA: MIT Press. Williams, R. J., & Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1,