Download presentation
Presentation is loading. Please wait.
Published byVeronika Tanudjaja Modified over 6 years ago
1
O I Sinit Mealy machine LOGIC STATE 1-Dec-18
RNNs are special FSMs Mealy is the most general form Both output and next state depend on input and current state STATE Sinit 1-Dec-18 Rudolf Mak TU/e Computer Science
2
I O Sinit Moore machine LOGIC STATE LOGIC 1-Dec-18
Moore machine is a more restricted FSM Output depends only on current state not on the inputs O LOGIC 1-Dec-18 Rudolf Mak TU/e Computer Science
3
y x Recurrent Neural Net FFNN STATE 1-Dec-18
RNN is a variation of the Moore Machine Logic replaced by FFNN (usually a single layer) Output logic becomes identity function Initial state is the original network input y STATE x 1-Dec-18 Rudolf Mak TU/e Computer Science
4
Recurrent Networks A recurrent network is characterized by
The connection graph of the network has cycles, i.e. the output of a neuron can influence its input There are no natural input and output nodes Initially each neuron has a given input state Neurons change state using some update rule The network evolves until some stable situation is reached The resulting state is the output of the network Description of the deployment of the network Not about the training of the network 1-Dec-18 Rudolf Mak TU/e Computer Science
5
Pattern Recognition Recurrent networks can be used for pattern
recognition in the following way: The stable states represent the patterns to be recognized The initial state is a noisy or otherwise mutilated version of one of the patterns The recognition process consists of the network evolving from its initial state to a stable state 1-Dec-18 Rudolf Mak TU/e Computer Science
6
Pattern Recognition Example
1-Dec-18 Rudolf Mak TU/e Computer Science
7
Pattern Recognition Example (cntd)
Noisy image Recognized pattern 1-Dec-18 Rudolf Mak TU/e Computer Science
8
Bipolar Data Encoding sgn(x ) = x xTx = n
In bipolar encoding firing of a neuron is repre-sented by the value 1, and non-firing by the value –1 In bipolar encoding the transfer function of the neurons is the sign function sgn A bipolar vector x of dimension n satisfies the equations sgn(x ) = x xTx = n Original Hopfield network uses binary encoding Nowadays HNs are ussually discussed using bipolar encoding Two (easy) properties. We will avoid application of sgn to zero 1-Dec-18 Rudolf Mak TU/e Computer Science
9
Binary versus Bipolar Encoding
The number of orthogonal vector pairs is much larger in case of bipolar encoding. In an even n- dimensional vector space: For binary encoding For bipolar encoding Ordered pairs are counted. 3^n follows from binomial expansion of (1+x)^n Choose m out of n coordinates zewro. Then there are 2^m bitpattern orthogonal. Compute the sum over 0 <= m <= n Bipolair. Choose a vector and flip half of its coordinates. This can be done In either 0 or m out of 2m ways. Use Stirling’s formula to show that bipolar outnumbers binary for even n. 1-Dec-18 Rudolf Mak TU/e Computer Science
10
Hopfield Networks A recurrent network is a Hopfield network when
The neurons have discrete output (for conve-nience we use bipolar encoding, i.e., activation function is the sign function) Each neuron has a threshold Each pair of neurons is connected by a weighted connection. The weight matrix is symmetric and has a zero diagonal (no connection from a neuron to itself) Instantiate recurrent network model with specific 1-Dec-18 Rudolf Mak TU/e Computer Science
11
Network states If a Hopfield network has n neurons, then the state of the network at time t is the vector x(t) 2 {-1, 1}n with components x i (t) that describe the states of the individual neurons. Time is discrete, so t 2 N The state of the network is updated using a so-called update rule. (Not) firing of a neuron at time t+1 will depend on the sign of the total input at time t 1-Dec-18 Rudolf Mak TU/e Computer Science
12
Update Strategies In a sequential network only one neuron at a time is allowed to change its state. In the asyn-chronous update rule this neuron is randomly selected. In a parallel network several neurons are allowed to change their state simultaneously. Limited parallelism: only neurons that are not connected can change their state simultaneously Unlimited parallelism: also connected neurons may change their state simultaneously Full parallelism: all neurons change their state simul-taneously 1-Dec-18 Rudolf Mak TU/e Computer Science
13
Asynchronous Update 1-Dec-18 Rudolf Mak TU/e Computer Science
14
Asynchronous Neighborhood
The asynchronous neighborhood of a state x is defined as the set of states Recall that w_kk = 0 for all k. Because wkk = 0 , it follows that for every pair of neighboring states x* 2 Na(x) 1-Dec-18 Rudolf Mak TU/e Computer Science
15
Synchronous Update This update rule corresponds to full parallelism
1-Dec-18 Rudolf Mak TU/e Computer Science
16
Sign-assumption In order for both update rules to be applica-ble, we assume that for all neurons i Because the number of states is finite, it is always possible to adjust the thresholds such that the above assumption holds. 1-Dec-18 Rudolf Mak TU/e Computer Science
17
Stable States A state x is called a stable state, when
For both the synchronous and the asyn-chronous update rule we have: a state is a stable state if and only if the update rule does not lead to a different state. 1-Dec-18 Rudolf Mak TU/e Computer Science
18
Cyclic behavior in asymmetric RNN
-1 1 -1 1 1 -1 1 This explains why W has to be symmetric. -1 1 1-Dec-18 Rudolf Mak TU/e Computer Science
19
Basins of Attraction stable state initial state state space 1-Dec-18
Link with dynamical systems. Attractor: minimal set of states in which the system is stuck Basin of attraction. Set of initial states that converge to the same attractor. initial state state space 1-Dec-18 Rudolf Mak TU/e Computer Science
20
Consensus and Energy The consensus C(x) of a state x of a
Hopfield network with weight matrix W and bias vector b is defined as Hopfield was a Physisist. Analogy with statistical mechanics. Ising models for spin state (up and down). The energy E(x) of a Hopfield network in state x is defined as 1-Dec-18 Rudolf Mak TU/e Computer Science
21
For any pair of vectors x and x* 2 Na(x) we have
Consensus difference For any pair of vectors x and x* 2 Na(x) we have Last equation: first term is zero because w_kk is zero third term is zero because W is symmetric 1-Dec-18 Rudolf Mak TU/e Computer Science
22
Asynchronous Convergence
If in an asynchronous step the state of the network changes from x to x-2xkek, then the consensus increases. Since there are only a finite number of states, the consensus serves as a variant function that shows that a Hopfield network evolves to a stable state, when the asynchronous update rule is used. Recall: z = Wx - b If x is modified then sgn(z_k) = - x_k 1-Dec-18 Rudolf Mak TU/e Computer Science
23
Stable States and Local maxima
A state x is a local maximum of the consensus function when Theorem: A state x is a local maximum of the consensus function if and only if it is a stable state. 1-Dec-18 Rudolf Mak TU/e Computer Science
24
Stable equals local maximum
Locally maximal is equivalent to for all neighboring states x* C(x*) <= C(x) 1-Dec-18 Rudolf Mak TU/e Computer Science
25
Modified Consensus The modified consensus of a state x of a Hopfield network with weight matrix W and bias vector b is defined as Let x , x*, and x** be successive states obtained with the synchronous update rule. Then 1-Dec-18 Rudolf Mak TU/e Computer Science
26
Synchronous Convergence
Suppose that x, x*, and x** are successive states obtained with the synchronous update rule. Then Hence a Hopfield network that evolves using the synchronous update rule will arrive either in a stable state or in a cycle of length 2. 1-Dec-18 Rudolf Mak TU/e Computer Science
27
Storage of a Single Pattern
How does one determine the weights of a Hopfield network given a set of desired sta- ble states? First we consider the case of a single stable state. Let x be an arbitrary vector. Choos-ing weight matrix W and bias vector b as makes x a stable state. 1-Dec-18 Rudolf Mak TU/e Computer Science
28
Proof of Stability 1-Dec-18 Rudolf Mak TU/e Computer Science
xx^T is sufficient to have x as a stable state. For convergence, however, the diagonal of W needs to be zero, hence the –I term 1-Dec-18 Rudolf Mak TU/e Computer Science
29
Hebb’s Postulate of Learning
Biological formulation When an axon of cell A is near enough to excite a cell and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency as one of the cells firing B is increased. Mathematical formulation Hebb’s postulate captures plasticity of the synapses This does not capture inhibitory synapses Simple form of Hebb’s rule is the product rule F(y, x) = \eta * y * x Hopfield networks use Hebbian learning (zero diagonal) 1-Dec-18 Rudolf Mak TU/e Computer Science
30
Hebb’s Postulate: revisited
Stent (1973), and Changeux and Danchin (1976) have expanded Hebb’s rule such that it also mo- dels inhibitory synapses: If two neurons on either side of a synapse are activated simultaneously (synchronously), then the strength of that synapse is selectively increased. If two neurons on either side of a synapse are activated asynchronously, then that synapse is selectively weakened or eliminated. Synchrony and asynchrony have to do with firing frequency 1-Dec-18 Rudolf Mak TU/e Computer Science
31
Example 1-Dec-18 Rudolf Mak TU/e Computer Science
Sign assumption ok because order is even Every row contains an odd number of Non-zero’s hence every innerproduct With a bipolar vector is non-zero 1-Dec-18 Rudolf Mak TU/e Computer Science
32
State encoding 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x1 -1 x2 x3 x4 Only half the table is necessary on account of symmetry Vector 4z shows which state transitions are possible If we choose in state 0 x_3 then the next state is 4 From which there is a single state transition to state 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 4z1 -1 -3 4z2 4z3 4z4 1-Dec-18 Rudolf Mak TU/e Computer Science
33
Finite state machine for async update
States 6 and 9 are attractors Basin of attraction of 6 = {2, 4, 7, 14} Basin of attraction of 9 = {1, 8, 11, 13} Remaining 6 states can converge to either 6 or 9 Non-determinism 1-Dec-18 Rudolf Mak TU/e Computer Science
34
Weights for Multiple Patterns
Let {x(p) j 1 · p · P } be a set of patterns, and let W(p) be the weight matrix corresponding to pattern number p. Choose the weight matrix W and the bias vector b for a Hopfield network that must recognize all P patterns as Question: Is x(p) indeed a stable state? 1-Dec-18 Rudolf Mak TU/e Computer Science
35
Remarks It is not guaranteed that a Hopfield network with weight matrix as defined on the previous slide indeed has the patterns as it stable states The disturbance caused by other patterns is called crosstalk. The closer the patterns are, the larger the crosstalk is This raises the question how many patterns there can be stored in a network before crosstalk gets the overhand 1-Dec-18 Rudolf Mak TU/e Computer Science
36
Weight matrix entry computation
W_ii = 0! Noodzakelijk voor convergentie Symmetrie is evident 1-Dec-18 Rudolf Mak TU/e Computer Science
37
Input of neuron i in state x(p)
1-Dec-18 Rudolf Mak TU/e Computer Science
38
Crosstalk The crosstalk term is defined by
Neuron i is stable when , because 1-Dec-18 Rudolf Mak TU/e Computer Science
39
Spurious States Besides the desired stable states the network can
have additional undesired (spurious) stable states If x is stable and b = 0, then –x is also stable. Some combinations of an odd number of stable states can be stable. Moreover there can be more complicated additional stable states (spin glass states) that bare no relation to the desired states. 1-Dec-18 Rudolf Mak TU/e Computer Science
40
Storage Capacity Question: How many stable states P can
be stored in a network of size n ? Answer: That depends on the probability of instability one is willing to accept. Experi- mentally P ¼ 0.15n has been found (by Hopfield) to be a reasonable value. 1-Dec-18 Rudolf Mak TU/e Computer Science
41
Probabilistic analysis 1
Assume that all components of the patterns are random variables with equal probability of being 1 and -1 Then it can be shown that has ap- proximately the standard normal distribu- tion N(0, 1). 1-Dec-18 Rudolf Mak TU/e Computer Science
42
Normal distribution 1-Dec-18 Rudolf Mak TU/e Computer Science
43
Probabilistic Analysis 2
From these assumptions it follows that Approximately because we pretend that there are nP terms in the summation Whereas there are in fact (n-1)*(P-1) terms Application of the central limit theorem yields 1-Dec-18 Rudolf Mak TU/e Computer Science
44
Standard Normal Distribution
The shaded area under the bell-shaped curve gives the probability Pr[y ¸ 1.5] 1-Dec-18 Rudolf Mak TU/e Computer Science
45
Probability of Instability
Left column: accepted probability of crosstalk Middle column: corresponding value of beta = sqrt (n/P) Right column: maximum ratio # stored patterns over pattern size 0.05 1.645 0.370 0.01 2.326 0.185 0.005 2.576 0.151 0.001 3.090 0.105 1-Dec-18 Rudolf Mak TU/e Computer Science
46
Topics Not Treated Reduction of crosstalk for correlated patterns
Stability analysis for correlated patterns Methods to eliminate spurious states Continuous Hopfield models Different associative memories Binary Associative Memory (Kosko) Brain State in a Box (Kawamoto, Anderson) 1-Dec-18 Rudolf Mak TU/e Computer Science
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.