Backpropagation in fully recurrent and continuous networks

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Bioinspired Computing Lecture 16
Dougal Sutherland, 9/25/13.
Backpropagation Learning Algorithm
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Learning in Recurrent Networks Psychology 209 February 25, 2013.
Modular Neural Networks CPSC 533 Franco Lee Ian Ko.
Lecture 14 – Neural Networks
Simple Neural Nets For Pattern Classification
Recurrent Neural Networks
Contents Sequences Time Delayed I Time Delayed II Recurrent I CS 476: Networks of Neural Computation, CSD, UOC, 2009 Recurrent II Conclusions WK5 – Dynamic.
Bernard Ans, Stéphane Rousset, Robert M. French & Serban Musca (European Commission grant HPRN-CT ) Preventing Catastrophic Interference in.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Supervised Learning: Perceptrons and Backpropagation.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Cascade Correlation Architecture and Learning Algorithm for Neural Networks.
Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Appendix B: An Example of Back-propagation algorithm
BACKPROPAGATION: An Example of Supervised Learning One useful network is feed-forward network (often trained using the backpropagation algorithm) called.
CSE & CSE6002E - Soft Computing Winter Semester, 2011 Neural Networks Videos Brief Review The Next Generation Neural Networks - Geoff Hinton.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks 2nd Edition Simon Haykin
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Chapter 6 Neural Network.
1 Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
NEURONAL NETWORKS AND CONNECTIONIST (PDP) MODELS Thorndike’s “Law of Effect” (1920’s) –Reward strengthens connections for operant response Hebb’s “reverberatory.
Back Propagation and Representation in PDP Networks
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Back Propagation and Representation in PDP Networks
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
Neural Network Architecture Session 2
Neural Networks.
Artificial Neural Networks
Simple recurrent networks.
Deep Learning Amin Sobhani.
Data Mining, Neural Network and Genetic Programming
James L. McClelland SS 100, May 31, 2011
Intro to NLP and Deep Learning
ICS 491 Big Data Analytics Fall 2017 Deep Learning
Intelligent Information System Lab
Different Units Ramakrishna Vedantam.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSE P573 Applications of Artificial Intelligence Neural Networks
Simple learning in connectionist networks
CSE 473 Introduction to Artificial Intelligence Neural Networks
Prof. Carolina Ruiz Department of Computer Science
Lecture 11. MLP (III): Back-Propagation
Recurrent Neural Networks
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.
CSE 573 Introduction to Artificial Intelligence Neural Networks
Backpropagation.
Neural Networks Geoff Hulten.
Backpropagation.
Learning linguistic structure with simple recurrent neural networks
Back Propagation and Representation in PDP Networks
Simple learning in connectionist networks
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
Prediction Networks Prediction A simple example (section 3.7.3)
The Network Approach: Mind as a Web
PYTHON Deep Learning Prof. Muhammad Saeed.
November 1, 2010 Dr. Itamar Arel College of Engineering
Prof. Carolina Ruiz Department of Computer Science
Presentation transcript:

Backpropagation in fully recurrent and continuous networks

The story so far… Constraint satisfaction networks Feed-forward backpropagation networks Simple recurrent networks Question: What do they do, what are they good for, how are they different?

The story so far… Constraint satisfaction networks Propagation of activation maximizes “goodness” over the network ie degree to which pattern of activation reflects constraints between units, reflected in weight values What are the weights and where do they come from?

The story so far… Feed-forward backpropagation networks Generate output pattern from input pattern Minimize prediction error directly through backpropagation learning Good for learning any input/output mapping, generalizing to new items… Anything they are missing relative to constraint satisfaction networks?

The story so far… Simple recurrent networks Just like feed-forward networks except Map sequence of inputs to sequence of outputs Can generalize to new sequences In recurrent layers, previous state can influence current state How does this compare to other models?

Today: Fully recurrent and continuous networks Fully recurrent: Any set of units can send connections to / receive connections from any other set of units. Which means that a unit can influence its own activation either directly or indirectly through connections with other units! Which means that the patterns generated by an input will evolve over time, even if the input is fixed! (Like in the Jets-N-Sharks model). With fully recurrent networks, processing an input involves settling over time to a “good”state. (Like Jets-n-Sharks model) But, fully recurrent models can (a) include hidden layers and (b) can learn by minimizing prediction error (like FF and SRN backpropagation models).

Today: Fully recurrent and continuous networks Continuous: Units do not instantaneously update their activations, but do so gradually over time. New parameter, d/t, controls how quickly a unit can change its activation! Allows “cascaded” activation: each layer transmits information as it is coming in, even though it has not fully adapted to its inputs Provides a way of thinking about the temporal processing of information.

Today: Fully recurrent and continuous networks Constraint satisfaction: inferences from input “feed back” to generate more inferences; network “settles” over time into interpretation of input (like Jets-N-Sharks, Room models). A way of discovering/learning weight configurations that “do” constraint satisfaction without hand-tuning (like backprop and SRNs) The possibility of incorporating hidden units for re-representation of inputs/outputs (like backprop/SRNs). The possibility of allowing units to respond gradually to their inputs rather than instantaneously (like in constraint satisfaction models. The possibility of accounting for sequential patterns of behavior (like SRNs).

Why? All the benefits of constraint satisfaction we have already seen Generalization, content-addressable memory, graceful degradation, etc. Coupled with an account of learning and the time-course of processing.

Learning in fully recurrent networks

Simple Recurrent Network (SRN) Current output (t) Hidden units Copy (t-1) Hidden rep at t-1 Current input (t)

t=0 t=1 Current input (t) Output t=2 Output Current input (t) Output Current input (t)

t=0 t=1 t=2 Output Output Output Current input (t) Current input (t) Current input (t)

Input … ------ Targets 0 1

1 2 Target: 1 0 Input … ------ Targets 0 1 Input: 0 0 1 2 … Time n δ at time n Input: 0 0 1 2 Time 3 δ at time 2 δ at time 1 Input: 0 1 1 2 Time 2 All same weight! Input: 0 1 Time 1 1 2 So, sum deltas over time

Input … 1 1 0 1 ------ Targets ------ 1 0 … 1 1 0 1

I: 0 0 1 2 … Time n+2i T: 0 1 1 2 I: 0 0 1 2 … Time n+i T: 1 1 Input … 1 1 0 1 ------ Targets ------ 1 0 … 1 1 0 1 I: 0 0 1 2 … Time n T: 1 0 I: 0 1 1 2 Time 3 T: - - I: 1 1 1 2 Time 2 T: - - I: 1 0 Time 1 1 2 T: - -

Hidden Word Picture

Hidden Targets Picture Word Time 0 Time 1

What about going from picture to word?

Hidden Picture Word Targets Time 0 Time 1

So fully recurrent backpropagation involves: “Unfolding” the network in time Training the model with standard backpropagation Summing the deltas for a given weight over the different time slices Changing the weight by the summed delta * the learning weight.

Some nuances: Time-delayed links Net inputs updated for all groups first Then activations updated So, activation “flows” through just one set of weights at a time

Some nuances: Continuous units 𝑎 𝑡 = 𝑎 𝑡−1 +𝑑𝑡 ( 𝑎 𝑖𝑛𝑠𝑡 − 𝑎 𝑡−1 ) How “fast” should a unit respond to change in updates? dt = “rate of change” of activation for unit When dt = 1, unit will fully update on every cycle The smaller dt, the more slowly the unit will approach the correct activation.

Varieties of temporal networks “Cascaded” network Feed-forward connectivity, graded activation updating (dt < 1) Simple recurrent network Feed forward connectivity with one or more “Elman” context layers Simple recurrent backprop-through-time SRN architecture, but error back-propagates through time Fully recurrent BPTT Fully recurrent network, BPTT training algorithm, instantaneous updating (dt = 1) Continuous recurrent network Fully recurrent connectivity, BPTT training algorithm, graded updating (dt < 1)

An example…