# Feedback Neural Networks

## Presentation on theme: "Feedback Neural Networks"— Presentation transcript:

Feedback Neural Networks
AI & NN Notes Chapter 9 Feedback Neural Networks

§9.1 Basic Concepts Attractor: a state toward which the system evolves in time starting from certain initial conditions. Basin of attraction: the set of initial conditions which initiates the evolution terminating in the attractor. Fixed point: if an attractor is in a form of a unique point in state space. Limit Cycle: if an attractor consists of a periodic sequence of states.

Hopfield Network and its Basic Assumptions
1 1. 1 layer, n neurons 2. T -- Threshold of neuron i 3. w -- weight from j to i 4. v -- output of neuron j 5. i -- external input to the i-th neuron v i 1 w 1 w 1 21 12 i v 2 w i 2 n1 2 w ij n2 T w 2 1n w n 2n j v i n n T n

The total input of the i-th neuron is
net =  n t w v + i - T = W V + i - T , for I=2, …, n i i i i ij j i i J=1 ji where w v i1 1 w v W = i2 V = 2 . i . . w v n in

The complete matrix description of the linear portion
of the system shown in the figure is given by net = WV + i - t where net i T 1 1 1 net T i 2 i = t = 2 net = 2 net i T n n n are vectors containing activation, external input to each neuron and threshold vector, respectively.

W is an nn matrix containing network weights:
w w w 1 12 13 1n w t w w w = W = 2 21 23 2n w = w w w w ij ji t 31 32 3n w n w w w n1 n2 n3 and w = 0 ii

§9.2 Discrete-Time Hopfield Network
Assuming that the neuron’s activation function is sgn, the transition rule of the i-th neuron would be -1, if net < 0 (inhibited state) v (*) +1, if net > 0 (excitatory state) If, for a given time, only a single neuron is allowed to update its output and only one entry in vector v is allowed to change, this is an asynchronous operation, under which each element of the output vector is updated separately while taking into account the most recent values for the elements that have already been updated and remain stable.

Based on (*), the update rule of a discrete-time
recurrent network, for one value of i at a time, becomes K+1 t k v = sgn(w v + i - T ) for random i, i=1, 2, …, n and k=0, 1, 2, ... i i i i where k denotes the index of recursive update. This is referred as asynchronous stochastic recursion of the Hopfield network. This update process will continue until all n entries of v have been updated. The recursive computation continues until the output node vector remains unchanged with further iterations. Similarly, for synchronous operation, we have K+1 i K+1 v = T[Wv + i - t], for all neurons, k=0, 1, ... k where all neurons change their output simultaneously.

Geometrical Explanation
The output vector v is one of the vertices of the n- dimensional cube [-1, 1] in E space. The vector moves during recursions from vertex to vertex, until it is stabilizes in one of the 2 vertices available. The movement is from a vertex to an adjacent vertex since the asynchronous update mode allows for a single-component update of an n-tuple vector at a time. The final position of v as k, is determined by weights, thresholds, inputs, and the initial vector v as well as the order of transitions. n n n

To evaluate the stability property of the dynamical
system of interest, the computational energy function is defined in n-dimensional output space v . If the increments of a certain bounded positive-valued computational energy function under the transition rule are found to be non-positive, then the function can be called a Lyapunov function, and the system would be asymptotically stable. The scalar-valued energy function for the discussed system is a quadratic form: n t t t E = - 1/2 V WV - i V + t V

The energy function in asynchronous mode.
or E = - 1/2   w v v -  i v +  t v n n n n i j i i i i ij i=1 j=1 ji i=1 i=1 The energy function in asynchronous mode. Assume that the output node I has been updated at the k-th instant so that v v = v . Computing the energy gradient vector: k+1 k i i t t  E = - 1/2 (W + W) v - i + t = - Wv - i + t t t t v W = W t The energy increment becomes t t  E = ( E) v = (-W v - i + t ) v t t i i i i This is because only the i-th output is updated.

This can be rewritten as
Therefore we have ( v) = [0 … v … 0] t i This can be rewritten as n  E = - (  w v + i - t ) v for j i, ij j i i i j=1 or briefly  E = - net v i i Note that when net < 0, then  v  0 when net > 0, then  v  0 thus (net v ) is always non-negative. In other words, any corresponding energy changes E are non-positive provided that w = w . i i i i i i ji ij

Further we can show that the non-increasing energy
function has a minimum. Since W is indefinite because of its zero diagonal, then E has neither a minimum nor maximum in unconstrained output space. However, E is obviously bounded in n-dimensional space consisting of the 2 vertices of n-dimensional cube, Thus, E has to reach its minimum finally under the update algorithm. n Example of recursive asynchronous update of computed digit 4:

(b) (a) (c) (d) (e) where (a) k=0, (b) k=1, (c) k=2, (d) k=3, (e) k=4. The initial map is a destroyed digit 4 with 20% of the pixels randomly reversed. For k>4, no changes are produced at the network output since the system arrived at one of its stable states.

Consider the continuous-time single-layer feedback networks. One of its model is given below. i i i i w n 2 3 1 1n w 32 w n1 g g g g u 1 2 3 n 2 u u n 1 u 3 c c c c 1 2 3 n v v v v 1 2 3 n v v v v 2 3 n 1

It consists of n neurons, each mapping its input u into
the output v through the activation function f(u ). Conductance w connects the output of the j-th neuron to the input of the i-th neuron. It is also assumed that w = w and w = 0. The KCL equation for the input node having potential u can be obtained as i i i ij ij ji ii i du n n i i +  w v - u (  w + g ) = C dt i ij j i ij i i j=1 jj j=1 jj n Defining G =  w + g , C = diag[C , …, C ], G = diag[G , …, G ] 1 n j=1 jj ij i n 1

It follows thus that the change of E, in time, are in the
Then we have du(t) C = Wv(t) - Gu(t) + i and v(t) = f(u(t)) dt It can be shown that dE du t dv = - (c ) < 0 dt dt dt It follows thus that the change of E, in time, are in the general direction toward lower values of he energy function in v space -- the stability condition. n

§9.4 Feedback Networks for Computational Applications
In principle, any optimization problems whose objective function can be expressed in the form of energy function can be solved by feedback networks convergence. Take the Traveling Salesman Problem as an example. Given a set of n cities A, B, C, … with pairwise distances d , d , … try to find a close tour which visits each city once, returns to the starting city and has a minimum total path length. AB AC This is an NP-complete problem.

To map this problem onto the computational network,
we require a representation scheme, in which the final location of any individual city is specified by the output states of a set of n neurons. E.g., n=5, the neuronal state (5 neurons) shown below would represent a tour: 2 Order City A B C D E

In the nn square representation, this means that in an
output state describing a valid tour there can be only one “1” in each row and each column, all other entries being “0”. 2 In this scheme, the n symbols v will be described by double indicies, v : x stands for city name, j for the position of that city in tour. i xj To enable the N neurons to compute a solution to the problem, the network must be described by an energy function in which the lowest energy state corresponds to the best path of the tour.

An appropriate form for this function can be found
by considering the high gain limit, in which all final normal output will be 0 or 1. The space over which the energy function is minimized in this limit is the 2 corners of the N-dimensional hypercube defined by v =0 or 1. N i Consider those corners of this space which are the local minima stable states) of the energy function B C A 2    v v +    v v + (  v - n) E = 2 2 2 xi xj xi yi xi 1 x i ji i x yx x i where A, B, C are positive constants, v  {0, 1}. i -- The first term is 0 iff each city row x contains no more than one “1”.

-- The second term is 0 iff each position in tour column
i contains no more than one “1”. -- The third term is 0 iff there are n cities of “1” in the entire matrix. Thus, this energy function evaluated on the domain of the corners of the hypercube has minima with E = 0 for all stable matrices with one “1” in each row and column. All other states have higher energy. 1 Hence, including these terms in an energy function describing a TSP network strongly favors stable states which are at least valid tour in the TSP problem.

Another requirement, that E favors valid tours
representing shout path, is fulfilled by adding one additional term to E . This term contains information about the length of the path corresponding to a given tour, and its form can be 1 D E =    d v (v v ) 2 2 xy xi y,i+1 y,i-1 x yx i where subscripts are defined modulo n, in order to express easily “end effects” such as the fact that the n-th city on a tour is adjacent in the tour to both city (n-1) and city 1, i.e., v = v . Within the domain of states which characterizes a valid tour, E is numeric- ally equal to the length of the path for that tour. y,n+j y,j 2

If A, B, and C are sufficiently large, all the really low
energy states of a network described by this function will have the form of a valid tour. The total energy of that state will be the length of the tour, and the states with the shortest path will be the lowest energy states. Using the row/column neuron labeling scheme described above for each of the two indicies, the implicitly defined connection matrix is T = - Ad (1-d ) “inhibitory connections within each row” - Bd (1-d )“inhibitory connections with each column” - C “global inhibition” - D (d d ) “data term” xi,yj xy ij ij xy xy j,i+1 j,i-1 The external input are I = 2C “excitation bias” n xi

The “data term” contribution, with D, to T is the
input which describes which TSP problem (I.e., where the cities actually are) is to be solved. xi,yj The term with A, B, C provide the general constraints required for any TSP problem. The “data term” contribution controls which one of the n! Set of these properly constrained final states is actually chosen as the “best” path. The problem formulated as shown below has been solved numerically for the continuous activation function with l =50, A=B=D=250, and C=100, for 10  n 30. Quite satisfactory solution has been found.

Path = D-H-I-F-G-E-A-J-C-B-D

By properly selecting the weights, it is possible to
§9.5 Associative Memory By properly selecting the weights, it is possible to make the stable states of the network just be the ones, M, we want to store. Under this condition, the network’s state should not change if the network is initially in the state M; whereas if not in M, it is expected that the network’s stable state should be the ones, in M, closest to the initial state (in the sense of Hamming distance). There are two categories of AM: 1) Auto-AM: If input x’=x +v, where x {x , …, x }, then output y=x . a a 1 M a a x +v a x

2) Hetero-AM:If x’=x +v, where x y , …, x y stored then output y=y .
One of the tasks is how to find the suitable weights such that the network perform a function of AM? The most frequently used rule for this purpose is the Outer Product Rule. Assume that -- Consider an n-neuron network; -- Each activity state x {-1, 1}; -- Hebbean rule is observed: w = x x , > 0. i a a ij i j

The outer product rule is as follows:
For given vectors M={U , …, U }, where U =(x , …,x ) write t k k 1 m k 1 n k k k k x x x x x x x x x x x x 1 2 1 n k k k k m m t 2 1 2 n W =  (U U - I) =  k k k=1 k=1 k k k k n 1 n 2 m k k  x x 1 n k=1 = m k k  x x n 1 k=1

and this can be implemented by following procedure: (1) Set W = [0]
(2) For k=1 to m, input U , do w = w + x x for all connected pair (i, j) Check to see if it is reasonable: 1) Suppose that U , …, U are orthogonal and m<n k k k ij ij i j 1 m m t t WU = (U U - I )U +  (U U - I )U 1 1 1 1 k k 1 k=2 m t t = U U U - IU +  (U U U - IU ) 1 1 1 1 k k 1 1 k=2 m = U n - U +  ( -IU ) = U (n-1) - (m-1)U = (n-m) U 1 1 1 1 1 1 k=2 Hence Sgn(WU ) = Sgn[(n-m)U ] = U 1 1 1

i.e., U is exactly a stable state of the network, and W
thus determined is reasonable. Example Given n=3, m=3 U =( ), U =( ), U =(1, -1 1) Thus t t t 1 2 3 t t U U -I = U U -I = 2 2 1 1 t 3 t U U -I = W =  (U U -I) = 3 3 k k k=1 1 = U WU = Sgn(WU ) = 1 1 1 1 -1 -2

Clearly, U , U , and U are stable memories. The
Similarly -2 -1 WU = Sgn(WU ) = 1 = U 2 2 2 1 1 -1 = U WU = -2 Sgn(WU ) = 3 3 3 1 Clearly, U , U , and U are stable memories. The structure of the network is as below: u 2 -1 -1 -1 u u 1 3

Given input U = (1 1 1), whether U  {U , U , U }?
Applications: a) Classification Given input U = ( ), whether U  {U , U , U }? t x x 1 2 3 -2 -1 WU = -2 Sgn(WU ) = -1  U x x -2 -1 Hence U does not belong to the set {U , U , U }. x 1 2 3 b) Associative Memory t Given a noisy input U = ( ) , what is U ? x x 1 WU = 1 Sgn(WU ) = 1 = U . U  U . x x 1 x -1 1 -1

In general, if n-dimensional vector U , …, U are
orthogonal, n>m, then 1 m m t t WU = (U U - I)U +  (U U - I)U k k k k i i k i=1 ik m t t = U (U U - IU ) +  (U U U - IU ) k k k k i i k k i=1 ik = U n - U + (m-1)(-IU ) = (n-1)U - (m-1)U = (n-m)U k k k k k k Sgn(WU ) = Sgn[(n-m)U ] = U , k = 1, 2, …, m k k k Hence {U } are stable states. k

Similar presentations