Learning linguistic structure with simple recurrent neural networks

Slides:

Advertisements

Similar presentations

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.

Advertisements

Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

Tuomas Sandholm Carnegie Mellon University Computer Science Department

Learning linguistic structure with simple recurrent networks February 20, 2013.

Learning in Recurrent Networks Psychology 209 February 25, 2013.

PDP: Motivation, basic approach. Cognitive psychology or “How the Mind Works”

Neural Networks Basic concepts ArchitectureOperation.

9.012 Brain and Cognitive Sciences II Part VIII: Intro to Language & Psycholinguistics - Dr. Ted Gibson.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:

Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

James L. McClelland Stanford University

Multiple-Layer Networks and Backpropagation Algorithms

Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.

Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University

Jamie Alexandre. ≠ = would you like acookie jason.

Connectionist Models of Language Development: Grammar and the Lexicon Steve R. Howell McMaster University, 1999.

Methodology of Simulations n CS/PY 399 Lecture Presentation # 19 n February 21, 2001 n Mount Union College.

Introduction to Neural Networks and Example Applications in HCI Nick Gentile.

Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.

The Emergentist Approach To Language As Embodied in Connectionist Networks James L. McClelland Stanford University.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Natural Language Processing Vasile Rus

Back Propagation and Representation in PDP Networks

Machine Learning Supervised Learning Classification and Regression

Multiple-Layer Networks and Backpropagation Algorithms

Back Propagation and Representation in PDP Networks

Convolutional Sequence to Sequence Learning

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

RNNs: An example applied to the prediction task

End-To-End Memory Networks

Neural Networks.

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

Deep Feedforward Networks

Simple recurrent networks.

Deep Learning Amin Sobhani.

Split-Brain Studies What do you see? “Nothing”

Chapter 11: Artificial Intelligence

Recurrent Neural Networks for Natural Language Processing

One-layer neural networks Approximation problems

James L. McClelland SS 100, May 31, 2011

Deep Learning: Model Summary

Intro to NLP and Deep Learning

CSE 473 Introduction to Artificial Intelligence Neural Networks

Backpropagation in fully recurrent and continuous networks

Intelligent Information System Lab

CSE 190 Modeling sequences: A brief overview

CSE P573 Applications of Artificial Intelligence Neural Networks

RNNs: Going Beyond the SRN in Language Prediction

Grid Long Short-Term Memory

A First Look at Music Composition using LSTM Recurrent Neural Networks

Recurrent Neural Networks

OVERVIEW OF BIOLOGICAL NEURONS

Artificial Neural Network & Backpropagation Algorithm

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Neural Networks Geoff Hulten.

Other Classification Models: Recurrent Neural Network (RNN)

RNNs: Going Beyond the SRN in Language Prediction

Back Propagation and Representation in PDP Networks

Neural networks (3) Regularization Autoencoder

LSTM: Long Short Term Memory

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

CSC321: Neural Networks Lecture 11: Learning in recurrent networks

Attention for translation

Introduction to Neural Networks

Recurrent Neural Networks

Sequence-to-Sequence Models

Presentation transcript:

Learning linguistic structure with simple recurrent neural networks Psychology 209 - 2019 February 5, 2019

Rules or Connections? How is it that we can process sentences we’ve never seen before? Colorless green ideas sleep furiously Chomsky, Fodor, Pinker, … Abstract, symbolic rules S-> NP VP ; NP -> (Adj)* N ; VP-> V (Adv) The connectionist alternative Function approximation using distributed representations and knowledge in connection weights

Elman’s Simple Recurrent Network (Elman, 1990) What is the best way to represent time Slots? Or time itself? What is the best way to represent language? Units and rules? Or connectionist learning? Is grammar learnable? If so, are there any necessary constraints?

The Simple Recurrent Network Network is trained on a stream of elements with sequential structure At step n, target for output is next element. Pattern on hidden units is copied back to the context units. After learning it comes to retain information about preceding elements of the string, allowing expectations to be conditioned by prior context.

Learning about sentence structure from streams of words

Learned and imputed hidden-layer representations (average vectors over all contexts) ‘Zog’ representation derived by averaging vectors obtained by inserting novel item in place of each occurrence of ‘man’.

Within-item variation by context

One Question in all of this: From Sonja: What is meant by the "co-occurrence structure of the domain“? Passage from the handbook: “Crucially, the input patterns representing the nouns and verbs were randomly assigned, and thus did not capture in any way the co-occurrence structure of the domain.” Preceding text: Verbs and nouns fell into different sub-types, – there were, for example, verbs of perception (which require an animate subject but can take any noun as object) and verbs of consumption, which require something consumable, and verbs of distruction, each of which had different restrictions on the nouns that could occur with it as subject and object.

Analyis of SRN’s using Simpler Sequential Structures (Servain-Schreiber, Cleeremans, & McClelland) The Grammar The Network

Hidden unit representations with 3 hidden units True Finite State Machine Graded State Machine

Training with Restricted Set of Strings 21 of the 43 valid strings of length 3-8

Progressive Deepening of the Network’s Sensitivity to Prior Context Note: Prior Context is only maintained if it is prediction-relevant at intermediate points.

Relating the Model to Human data Experiment: Implicit sequence learning Input is a screen position (corresponding to a letter in the grammar) Response measure is RT: time from stimulus to button press (very few errors ever occur) Assumption: anticipatory activation of output unit reduces RT Fit to data: compare model’s predictions at different time points to human RT’s at these time points. A pretty good fit was obtained after adding two additional assumptions: Activation carries over (with decay) from the previous time step Connection weight adjustments have both a fast and a slow component.

Results and Model Fit Basic Model Fit Human Behavior Extended

Elman (1991)

NV Agreement and Verb successor prediction Histograms show summed activation for classes of words: W = who S = period V1/V2 / N1/N2/PN indicate singular, plural, or proper For V’s: N = No DO O = Optional DO R = Required DO

Prediction with an embedded clause

What does it mean that ‘RNN’s are Turing Complete’? From stack exchange: The following paper shows that, for any computable function, there exists a finite recurrent neural network (RNN) that can compute it. Furthermore, there exist finite RNNs that are Turing complete, and can therefore implement any algorithm. Siegelmann and Sontag (1992). On the computational power of neural nets.

Going Beyond the SRN Back-propagation through time The vanishing gradient Problem Solving the vanishing gradient problem with LSTM’s The problem of generalization and overfitting Solutions to the overfitting problem Applying LSTM’s with dropout to a full-scale version of Elman’s prediction task