Learning linguistic structure with simple recurrent neural networks

Slides:



Advertisements
Similar presentations
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Advertisements

Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Learning linguistic structure with simple recurrent networks February 20, 2013.
Learning in Recurrent Networks Psychology 209 February 25, 2013.
PDP: Motivation, basic approach. Cognitive psychology or “How the Mind Works”
Neural Networks Basic concepts ArchitectureOperation.
9.012 Brain and Cognitive Sciences II Part VIII: Intro to Language & Psycholinguistics - Dr. Ted Gibson.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:
Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
James L. McClelland Stanford University
Multiple-Layer Networks and Backpropagation Algorithms
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
Jamie Alexandre. ≠ = would you like acookie jason.
Connectionist Models of Language Development: Grammar and the Lexicon Steve R. Howell McMaster University, 1999.
Methodology of Simulations n CS/PY 399 Lecture Presentation # 19 n February 21, 2001 n Mount Union College.
Introduction to Neural Networks and Example Applications in HCI Nick Gentile.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
The Emergentist Approach To Language As Embodied in Connectionist Networks James L. McClelland Stanford University.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
SYNTAX.
Natural Language Processing Vasile Rus
Back Propagation and Representation in PDP Networks
Machine Learning Supervised Learning Classification and Regression
Multiple-Layer Networks and Backpropagation Algorithms
Back Propagation and Representation in PDP Networks
Convolutional Sequence to Sequence Learning
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
RNNs: An example applied to the prediction task
End-To-End Memory Networks
Neural Networks.
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Deep Feedforward Networks
Simple recurrent networks.
Deep Learning Amin Sobhani.
Split-Brain Studies What do you see? “Nothing”
Chapter 11: Artificial Intelligence
Recurrent Neural Networks for Natural Language Processing
One-layer neural networks Approximation problems
James L. McClelland SS 100, May 31, 2011
Deep Learning: Model Summary
Intro to NLP and Deep Learning
CSE 473 Introduction to Artificial Intelligence Neural Networks
Backpropagation in fully recurrent and continuous networks
Intelligent Information System Lab
CSE 190 Modeling sequences: A brief overview
CSE P573 Applications of Artificial Intelligence Neural Networks
RNNs: Going Beyond the SRN in Language Prediction
Grid Long Short-Term Memory
A First Look at Music Composition using LSTM Recurrent Neural Networks
Recurrent Neural Networks
OVERVIEW OF BIOLOGICAL NEURONS
Artificial Neural Network & Backpropagation Algorithm
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Neural Networks Geoff Hulten.
Other Classification Models: Recurrent Neural Network (RNN)
RNNs: Going Beyond the SRN in Language Prediction
Back Propagation and Representation in PDP Networks
Neural networks (3) Regularization Autoencoder
LSTM: Long Short Term Memory
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
Attention for translation
Introduction to Neural Networks
Recurrent Neural Networks
Sequence-to-Sequence Models
Presentation transcript:

Learning linguistic structure with simple recurrent neural networks Psychology 209 - 2019 February 5, 2019

Rules or Connections? How is it that we can process sentences we’ve never seen before? Colorless green ideas sleep furiously Chomsky, Fodor, Pinker, … Abstract, symbolic rules S-> NP VP ; NP -> (Adj)* N ; VP-> V (Adv) The connectionist alternative Function approximation using distributed representations and knowledge in connection weights

Elman’s Simple Recurrent Network (Elman, 1990) What is the best way to represent time Slots? Or time itself? What is the best way to represent language? Units and rules? Or connectionist learning? Is grammar learnable? If so, are there any necessary constraints?

The Simple Recurrent Network Network is trained on a stream of elements with sequential structure At step n, target for output is next element. Pattern on hidden units is copied back to the context units. After learning it comes to retain information about preceding elements of the string, allowing expectations to be conditioned by prior context.

Learning about sentence structure from streams of words

Learned and imputed hidden-layer representations (average vectors over all contexts) ‘Zog’ representation derived by averaging vectors obtained by inserting novel item in place of each occurrence of ‘man’.

Within-item variation by context

One Question in all of this: From Sonja: What is meant by the "co-occurrence structure of the domain“? Passage from the handbook: “Crucially, the input patterns representing the nouns and verbs were randomly assigned, and thus did not capture in any way the co-occurrence structure of the domain.” Preceding text: Verbs and nouns fell into different sub-types, – there were, for example, verbs of perception (which require an animate subject but can take any noun as object) and verbs of consumption, which require something consumable, and verbs of distruction, each of which had different restrictions on the nouns that could occur with it as subject and object.

Analyis of SRN’s using Simpler Sequential Structures (Servain-Schreiber, Cleeremans, & McClelland) The Grammar The Network

Hidden unit representations with 3 hidden units True Finite State Machine Graded State Machine

Training with Restricted Set of Strings 21 of the 43 valid strings of length 3-8

Progressive Deepening of the Network’s Sensitivity to Prior Context Note: Prior Context is only maintained if it is prediction-relevant at intermediate points.

Relating the Model to Human data Experiment: Implicit sequence learning Input is a screen position (corresponding to a letter in the grammar) Response measure is RT: time from stimulus to button press (very few errors ever occur) Assumption: anticipatory activation of output unit reduces RT Fit to data: compare model’s predictions at different time points to human RT’s at these time points. A pretty good fit was obtained after adding two additional assumptions: Activation carries over (with decay) from the previous time step Connection weight adjustments have both a fast and a slow component.

Results and Model Fit Basic Model Fit Human Behavior Extended

Elman (1991)

NV Agreement and Verb successor prediction Histograms show summed activation for classes of words: W = who S = period V1/V2 / N1/N2/PN indicate singular, plural, or proper For V’s: N = No DO O = Optional DO R = Required DO

Prediction with an embedded clause

What does it mean that ‘RNN’s are Turing Complete’? From stack exchange: The following paper shows that, for any computable function, there exists a finite recurrent neural network (RNN) that can compute it. Furthermore, there exist finite RNNs that are Turing complete, and can therefore implement any algorithm. Siegelmann and Sontag (1992). On the computational power of neural nets.

Going Beyond the SRN Back-propagation through time The vanishing gradient Problem Solving the vanishing gradient problem with LSTM’s The problem of generalization and overfitting Solutions to the overfitting problem Applying LSTM’s with dropout to a full-scale version of Elman’s prediction task