Backpropagation in fully recurrent and continuous networks

Backpropagation in fully recurrent and continuous networks

The story so far… Constraint satisfaction networks
Feed-forward backpropagation networks Simple recurrent networks Question: What do they do, what are they good for, how are they different?

The story so far… Constraint satisfaction networks
Propagation of activation maximizes “goodness” over the network ie degree to which pattern of activation reflects constraints between units, reflected in weight values What are the weights and where do they come from?

The story so far… Feed-forward backpropagation networks
Generate output pattern from input pattern Minimize prediction error directly through backpropagation learning Good for learning any input/output mapping, generalizing to new items… Anything they are missing relative to constraint satisfaction networks?

The story so far… Simple recurrent networks
Just like feed-forward networks except Map sequence of inputs to sequence of outputs Can generalize to new sequences In recurrent layers, previous state can influence current state How does this compare to other models?

Today: Fully recurrent and continuous networks
Fully recurrent: Any set of units can send connections to / receive connections from any other set of units. Which means that a unit can influence its own activation either directly or indirectly through connections with other units! Which means that the patterns generated by an input will evolve over time, even if the input is fixed! (Like in the Jets-N-Sharks model). With fully recurrent networks, processing an input involves settling over time to a “good”state. (Like Jets-n-Sharks model) But, fully recurrent models can (a) include hidden layers and (b) can learn by minimizing prediction error (like FF and SRN backpropagation models).

Continuous: Units do not instantaneously update their activations, but do so gradually over time. New parameter, d/t, controls how quickly a unit can change its activation! Allows “cascaded” activation: each layer transmits information as it is coming in, even though it has not fully adapted to its inputs Provides a way of thinking about the temporal processing of information.

Constraint satisfaction: inferences from input “feed back” to generate more inferences; network “settles” over time into interpretation of input (like Jets-N-Sharks, Room models). A way of discovering/learning weight configurations that “do” constraint satisfaction without hand-tuning (like backprop and SRNs) The possibility of incorporating hidden units for re-representation of inputs/outputs (like backprop/SRNs). The possibility of allowing units to respond gradually to their inputs rather than instantaneously (like in constraint satisfaction models. The possibility of accounting for sequential patterns of behavior (like SRNs).

Why? All the benefits of constraint satisfaction we have already seen
Generalization, content-addressable memory, graceful degradation, etc. Coupled with an account of learning and the time-course of processing.

Learning in fully recurrent networks

Simple Recurrent Network (SRN)
Current output (t) Hidden units Copy (t-1) Hidden rep at t-1 Current input (t)

t=0 t=1 Current input (t) Output t=2 Output Current input (t) Output Current input (t)

t=0 t=1 t=2 Output Output Output Current input (t) Current input (t) Current input (t)

Input … ------ Targets 0 1

1 2 Target: Input … ------ Targets 0 1 Input: 0 0 1 2 … Time n δ at time n Input: 0 0 1 2 Time 3 δ at time 2 δ at time 1 Input: 0 1 1 2 Time 2 All same weight! Input: 0 1 Time 1 1 2 So, sum deltas over time

Input … 1 1 0 1 ------ Targets ------ 1 0 … 1 1 0 1

I: 0 0 1 2 … Time n+2i T: 0 1 1 2 I: 0 0 1 2 … Time n+i T: 1 1 Input … 1 1 0 1 ------ Targets ------ 1 0 … 1 1 0 1 I: 0 0 1 2 … Time n T: 1 0 I: 0 1 1 2 Time 3 T: - - I: 1 1 1 2 Time 2 T: - - I: 1 0 Time 1 1 2 T: - -

Hidden Word Picture

Hidden Targets Picture Word Time 0 Time 1

What about going from picture to word?

Hidden Picture Word Targets Time 0 Time 1

So fully recurrent backpropagation involves:
“Unfolding” the network in time Training the model with standard backpropagation Summing the deltas for a given weight over the different time slices Changing the weight by the summed delta * the learning weight.

Some nuances: Time-delayed links
Net inputs updated for all groups first Then activations updated So, activation “flows” through just one set of weights at a time

Some nuances: Continuous units
𝑎 𝑡 = 𝑎 𝑡−1 +𝑑𝑡 ( 𝑎 𝑖𝑛𝑠𝑡 − 𝑎 𝑡−1 ) How “fast” should a unit respond to change in updates? dt = “rate of change” of activation for unit When dt = 1, unit will fully update on every cycle The smaller dt, the more slowly the unit will approach the correct activation.

Varieties of temporal networks
“Cascaded” network Feed-forward connectivity, graded activation updating (dt < 1) Simple recurrent network Feed forward connectivity with one or more “Elman” context layers Simple recurrent backprop-through-time SRN architecture, but error back-propagates through time Fully recurrent BPTT Fully recurrent network, BPTT training algorithm, instantaneous updating (dt = 1) Continuous recurrent network Fully recurrent connectivity, BPTT training algorithm, graded updating (dt < 1)

An example…

Backpropagation in fully recurrent and continuous networks

Similar presentations

Presentation on theme: "Backpropagation in fully recurrent and continuous networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Backpropagation in fully recurrent and continuous networks

Similar presentations

Presentation on theme: "Backpropagation in fully recurrent and continuous networks"— Presentation transcript:

Similar presentations

About project

Feedback