Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.

Similar presentations


Presentation on theme: "Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series."— Presentation transcript:

1 Recurrent Network InputsOutputs

2 Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series – Modeling of Time series – Mapping one Time series onto other Signal Processing ( a field very close to TS processing) Optimization problems (like Traveling Salesman)

3 Address – addressable Vs. content addressable Memory AM provide an approach of storing and retrieving data based on content rather than storage address. Storage in a NN is distributed throughout the system in the net’s weights, hence a pattern does not have a storage.

4 Auto associative vs. HeteroAssociative Memory Each association is an i/p o/p vector pair, s:f. Association (for two patterns s and f) s = f s = !f auto-associative memoryhetero-associative memory

5 So What’s the difference? The net not only learns the specific patterns pairs that were used for training, but is also able to recall the desired response pattern when given an I/P stimulus that is similar, but not identical, to the training I/P. Associative Recalls evoke associated patterns recall a pattern by part of it evoke/recall with incomplete/ noisy patterns

6 Training an AM NN The original patterns must be converted to an appropriate representation for computation. “on” →+1, “off”→0 (binary representation) OR “on”→+1, “off”→-1 (bipolar representation). Two common training methods for single layer nets are usually : – Hebbian learning rule – Its variations–gradient descent

7 Hebbian Learning Rule ”When one cell repeatedly assists in firing another, the axon of the first cell develops synaptic knobs (or enlarges them if they already exist) in contact with the soma of the second cell.” (Hebb, 1949) In an associative neural net, if we compare two pattern components (e.g. pixels) within many patterns and find that they are frequently in: a) the same state, then the arc weight between their NN nodes should be +ve b) different states, then the arc weight between their NN nodes shoud be -ve The weights must store the average correlations between all pattern components across all patterns. A net presented with a partial pattern can then use the correlations to recreate the entire pattern. Weights = Average Correlations

8 Quantative definition of Hebbian Learning Rule Auto-Association: * When the two components are the same (different), increase (decrease) the weight Hetero-Association: i = input component o = output component Ideally, the weights will record the average correlations across all patterns: Hebbian Principle: If all the input patterns are known prior to retrieval time, then init weights as: Auto: Hetero: Auto:Hetero:

9 Architectures of AM NN Associative memory NN Static / Feed Forward Systems Dynamic /Recurrent / Iterative Systems i.e. with feed back Auto associative Hetero associative Auto associative Hetero associative

10 Mapping of Inputs to Outputs Information Recording M is expressed as the prototype vectors stored. M is matrix type operator Information Retrieval Mapping: x →v Linear or nonlinear v = M [x] Input a key vector x and find a desired vector. v previously stored in the network.

11 Static vs. Dynamic Static Memory Recurrent Autoassociative Memory The operator M2 operates at present instant k on the present input xk and output vk to produce the o/p in the next instant k+1 Δ is a unit delay needed for cyclic operation. The pattern is associated to itself (auto-)

12 Heteroassociative Memory Net Memory Association of pairs (x,v)This AM operates with a cycle of 2Δ. It associates pairs of vectors (x(i),v(i)).

13 Hop Field Model (A recurrent auto associative network) The input x0 is used to initialize v0, i.e. x0= v0, and the input is then removed for the following evolution: Operator M2 consists of multiplication by a weight matrix followed by the ensemble of non-linear mapping operations vi = f(neti) performed by the layer of neurons.

14 Hopfield Model

15 Hopfields AutoassociativeMemory (1982,1984) Distributed Representation Info is stored as a pattern of activations/weights Multiple info is imprinted on the same network Content-addressable memory Store patterns in a network by adjusting weights To retrieve a pattern, specify a portion Distributed, asynchronous control Individual processing elements behave independently Fault tolerance Few processors can fail, and the network will still work Active or Inactive processing units are in one of two states Units are connected with weighted, symmetric connections

16 Multiple-loop feedback system with no self-feedback 1 23 +1 neuron weight Example of Hopfield NN for 3 dimensional input data X1 X2X3 Attribute 3 of input (x1,x2,x3) Execution: Input pattern attributes are initial states of neurons Repeatedly update state of neurons asynchronously until states do not change

17 Hopfields Autoassociative Memory Input vectors values are in {-1,1} (or {0,1}). The number of neurons is equal to the input dimension. Every neuron has a link from every other neuron (recurrent architecture) except itself (no self-feedback). The activation function used to update a neuron state is the sign function but if the input of the activation function is 0 then the new output (state) of the neuron is equal to the old one. The weights are symmetric

18 NN Training 1.Storage:. Let f 1, f 2, …, f M denote a known set of N-dimensional fundamental memories. The weights of the network are: i-th component of the fundamental memory. State of neuron i at time n. w ji is the weight from neuron i to neuron j. The elements of the vector f μ are in {-1,+1}. Once they are computed, the synaptic weights are kept fixed. N: input dimension ; M: number of patterns (called fundamental memories) are used to compute the weights.

19 NN Training Each stored pattern (fundamental memory) establishes a correlation between pairs of neurons: neurons tend to be of the same sign or opposite sign according to their value in the pattern. If w ji is large, this expresses an expectation that neurons i and j are positively correlated. If it is small (negative) this indicates a negative correlation. will thus be large for a state x equal to a fundamental memory (since wij will be positive if the product xi xj > 0 and negative if xi xj < 0). The negative of the sum will thus be small.

20 NN Execution 2. Initialization: Let x denote an input vector( probe) presented to the network. The algorithm is initialized by setting: is the j-th element of the probe vector x probe. is the state of neuron j at time t =0 j = 1……N

21 NN Execution 3. Iteration Until Convergence Update the elements of network state vector x (n) asynchronously (i.e. randomly and one at the time) according to the rule: Repeat the iteration until the state vector x remains unchanged.

22 NN Execution Outputting: Let denote the fixed point (or stable state, that is such that x(t+1)=x(t)) computed at the end of step 3. The resulting output y of the network is:

23 Pictorial Execution of Hop field Net No. Of neuron = dimension of pattern Fully connected Weights = avg correlations across all patterns of the corresponding units 1 3 4 2 2. Distributed Storage of All Patterns: 1 34 2 1 34 2 1 34 2 1 34 2 3. Retrieval 1 1 34 2 1 34 2 Comp/Node value legend: dark (blue) with x => +1 dark (red) w/o x => -1 light (green) => 0 1. Auto-Associative Patterns to Remember

24 Hopfield Network Example 1 34 2 1 34 2 1. Patterns to Remember p1p1 p2p2 p3p3 W 12 1 1 -1 1/3 p1p1 p2p2 p3p3 Avg W 13 1 -1 -1 -1/3 W 14 -1 1 1 1/3 W 23 1 -1 1 1/3 W 24 -1 1 -1 -1/3 W 34 -1 -1 -1 -1 1 3 4 2 [-] [+] -1/3 1/3 3. Build Network 4. Enter Test Pattern 1 34 2 -1/3 1/3 1 34 2 +10 -1/3

25 Hopfield Network Example (2) 5. Synchronous Iteration (update all nodes at once) -1/3 1/3 -1/3 1/3 Stable State p1p1 1 34 2 = Values from Input Layer From discrete output rule: sign(sum) Node1234 Output 1100-1/31 21/3001/31 3-1/30011 41/300-1-1 Inputs

26 Matrices Computation Goal: Set weights such that an input vector Vi, yields itself when multiplied by the weights, W. X = V1,V2..Vp, where p = # input vectors (i.e., patterns) So Y=X, and the Hebbian weight calculation is: W = X T Y = X T X 1 1 -1 1 1 1 -1 1 1 1 X = 1 1 -1 1X T = 1 -1 1 -1 1 1 -1-1 1 -1 3 1 -1 1 Common index = pattern #, so X T X = 1 3 1 -1 this is correlation sum. -1 1 3 -3 1 -1 -3 3 w 2,4 = w 4,2 = x T 2,1 x 1,4 + x T 2,2 x 2,4 + x T 2,3 x 3,4

27 Matrices computation The upper and lower triangles of the product matrix represents the 6 weights w i,j = w j,i Scale the weights by dividing by p (i.e., averaging). This produces the same weights as in the non-matrix description. Testing with input = ( 1 0 0 -1) 3 1 -1 1 (1 0 0 -1) 1 3 1 -1 = (2 2 2 -2) -1 1 3 -3 1 -1 -3 3 Scaling by p = 3 and using 0 as a threshold gives: (2/3 2/3 2/3 -2/3) => (1 1 1 -1)

28 Associative Retrieval = Search p1p1 p2p2 p3p3 Back-propagation: Search in space of weight vectors to minimize output error Associative Memory Retrieval: Search in space of node values to minimize conflicts between a)node-value pairs and average correlations (weights), and b) node values and their initial values. Input patterns are local (sometimes global) minima, but many spurious patterns are also minima. High dependence upon initial pattern and update sequence (if asynchronous)

29 Energy Function Energy of the associative memory should be low when pairs of node values mirror the average correlations (i.e. weights) on the arcs that connect the node pair, and when current node values equal their initial values (from the test pattern). When pairs match correlations, w kj x j x k > 0 When current values match input values, I k x k > 0 Gradient Descent A little math shows that asynchronous updates using the discrete rule: yield a gradient descent search along the energy landscape for the E defined above.

30 Storage Capacity of Hopfield Networks Capacity = Relationship between # patterns that can be stored & retrieved without error to the size of the network. Capacity = # patterns / # nodes or # patterns / # weights If we use the following definition of 100% correct retrieval: When any of the stored patterns is entered completely (no noise), then that same pattern is returned by the network; i.e. The pattern is a stable attractor. A detailed proof shows that a Hopfield network of N nodes can achieve 100% correct retrieval on P patterns if: P < N/(4*ln(N)) N Max P 101 1005 1000 36 10000271 10 11 10 9 In general, as more patterns are added to a network, the avg correlations will be less likely to match the correlations in any particular pattern. Hence, the likelihood of retrieval error will increase. => The key to perfect recall is selective ignorance!!

31 Auto-Associative -vs- Hetero-associative  Wide variety of net topologies  All use Hebbian Learning => weights ~ avg correlations One-shot -vs- Iterative Retrieval  Iterative gives much better error correction. Asynchronous -vs- Synchronous state updates  Synchronous updates can easily lead to oscillation  Asynchronous updates can quickly find a local optima (attractor)  Update order can determine attractor that is reached. Pattern Retrieval = Search in node-state space.  Spurious patterns are hard to avoid, since many are attractors also.  Stochasticity helps jiggle out of local minima.  Memory load increase => recall error increase. Associative -vs- Feed-Forward Nets  Assoc: Many - 1 mapping Feed-Forward: many-many mapping  Backprop is resource-intensive, while Hopfield iterative update is O(n)  Gradient-Descent on an Error -vs- Energy Landscape:  Backprop => arc-weight space Hopfield => node-state space Things to Remember


Download ppt "Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series."

Similar presentations


Ads by Google