Download presentation

Presentation is loading. Please wait.

1
AI & NN Notes Chapter 9 Feedback Neural Networks

2
§9.1 Basic Concepts Attractor: a state toward which the system evolves in time starting from certain initial conditions. Basin of attraction: the set of initial conditions which initiates the evolution terminating in the attractor. Fixed point: if an attractor is in a form of a unique point in state space. Limit Cycle: if an attractor consists of a periodic sequence of states.

3
Hopfield Network and its Basic Assumptions 1 2 n v v v 1 2 n T T T 1 2 n i i i n 2 1 w w w w n1 w w n2 1n 2n 1. 1 layer, n neurons 2. T -- Threshold of neuron i 3. w -- weight from j to i 4. v -- output of neuron j 5. i -- external input to the i-th neuron ij i j

4
The total input of the i-th neuron is net = i J=1 j i n w v + i - T = W V + i - T, for I=2, …, n ijjii i t ii where W = w w w i i1 i2 in... V = v v v 1 2 n

5
The complete matrix description of the linear portion of the system shown in the figure is given by net = WV + i - t where net = net i = i i i 1 2 n 1 2 n are vectors containing activation, external input to each neuron and threshold vector, respectively. t = T T T 1 n 2

6
W is an n n matrix containing network weights: W = w w w = www ww w www www n t t t 12131n 21232n 31323n n1n2n3 w = w ijji w = 0 ii and

7
§9.2 Discrete-Time Hopfield Network Assuming that the neurons activation function is sgn, the transition rule of the i-th neuron would be -1, if net < 0 (inhibited state) +1, if net > 0 (excitatory state) v If, for a given time, only a single neuron is allowed to update its output and only one entry in vector v is allowed to change, this is an asynchronous operation, under which each element of the output vector is updated separately while taking into account the most recent values for the elements that have already been updated and remain stable. (*)

8
Based on (*), the update rule of a discrete-time recurrent network, for one value of i at a time, becomes v = sgn(w v + i - T ) for random i, i=1, 2, …, n and k=0, 1, 2,... i K+1 i tk i i where k denotes the index of recursive update. This is referred as asynchronous stochastic recursion of the Hopfield network. This update process will continue until all n entries of v have been updated. The recursive computation continues until the output node vector remains unchanged with further iterations. Similarly, for synchronous operation, we have i K+1 v = T[Wv + i - t], for all neurons, k=0, 1,... K+1 k where all neurons change their output simultaneously.

9
Geometrical Explanation The output vector v is one of the vertices of the n- dimensional cube [-1, 1] in E space. The vector moves during recursions from vertex to vertex, until it is stabilizes in one of the 2 vertices available. The movement is from a vertex to an adjacent vertex since the asynchronous update mode allows for a single-component update of an n-tuple vector at a time. The final position of v as k, is determined by weights, thresholds, inputs, and the initial vector v as well as the order of transitions. nn 0 n

10
To evaluate the stability property of the dynamical system of interest, the computational energy function is defined in n-dimensional output space v. If the increments of a certain bounded positive-valued computational energy function under the transition rule are found to be non-positive, then the function can be called a Lyapunov function, and the system would be asymptotically stable. The scalar-valued energy function for the discussed system is a quadratic form: n E = - 1/2 V WV - i V + t V ttt

11
or E = - 1/2 w v v - i v + t v i=1 n j=1 j i n ij ij i=1 n ii n ii The energy function in asynchronous mode. Assume that the output node I has been updated at the k-th instant so that v - v = v. Computing the energy gradient vector: i k+1k i E = - 1/2 (W + W) v - i + t = - Wv - i + t v ttt tt W = W t The energy increment becomes E = ( E) v = (-W v - i + t ) v t tt t iiii This is because only the i-th output is updated.

12
Therefore we have ( v) = [0 … v … 0] i t This can be rewritten as E = - ( w v + i - t ) v for j i, ii ijij n j=1 or briefly E = - net v i i Note that when net < 0, then v 0 when net > 0, then v 0 thus (net v ) is always non-negative. In other words, any corresponding energy changes E are non-positive provided that w = w. i i i i i i ij ji

13
Further we can show that the non-increasing energy function has a minimum. Since W is indefinite because of its zero diagonal, then E has neither a minimum nor maximum in unconstrained output space. However, E is obviously bounded in n-dimensional space consisting of the 2 vertices of n-dimensional cube, Thus, E has to reach its minimum finally under the update algorithm. n Example of recursive asynchronous update of computed digit 4:

14
(a) (b) (c)(d)(e) where (a) k=0, (b) k=1, (c) k=2, (d) k=3, (e) k=4. The initial map is a destroyed digit 4 with 20% of the pixels randomly reversed. For k>4, no changes are produced at the network output since the system arrived at one of its stable states.

15
§9.3 Gradient-Type Hopfield Network Consider the continuous-time single-layer feedback networks. One of its model is given below. c c c c gggg n n 123n vvvv w n1 1n w w 32 iiii 1 23 n u u u u n 1 23n v vvv

16
It consists of n neurons, each mapping its input u into the output v through the activation function f(u ). Conductance w connects the output of the j-th neuron to the input of the i-th neuron. It is also assumed that w = w and w = 0. The KCL equation for the input node having potential u can be obtained as i ii ij jiii i i + w v - u ( w + g ) = C i j=1 j n ijji j=1 j n ijii du dt i Defining G = w + g, C = diag[C, …, C ], G = diag[G, …, G ] j=1 j n iji 1n 1 n

17
Then we have C du(t) dt = Wv(t) - Gu(t) + i and v(t) = f(u(t)) It can be shown that dE dt = - (c ) du dt t dv dt It follows thus that the change of E, in time, are in the general direction toward lower values of he energy function in v space -- the stability condition. n < 0

18
§9.4 Feedback Networks for Computational Applications In principle, any optimization problems whose objective function can be expressed in the form of energy function can be solved by feedback networks convergence. Take the Traveling Salesman Problem as an example. Given a set of n cities A, B, C, … with pairwise distances d, d, … try to find a close tour which visits each city once, returns to the starting city and has a minimum total path length. ABAC This is an NP-complete problem.

19
To map this problem onto the computational network, we require a representation scheme, in which the final location of any individual city is specified by the output states of a set of n neurons. E.g., n=5, the neuronal state (5 neurons) shown below would represent a tour: 2 City Order ABCDEABCDE

20
In the n n square representation, this means that in an output state describing a valid tour there can be only one 1 in each row and each column, all other entries being 0. In this scheme, the n symbols v will be described by double indicies, v : x stands for city name, j for the position of that city in tour. 2 i To enable the N neurons to compute a solution to the problem, the network must be described by an energy function in which the lowest energy state corresponds to the best path of the tour. xj

21
An appropriate form for this function can be found by considering the high gain limit, in which all final normal output will be 0 or 1. The space over which the energy function is minimized in this limit is the 2 corners of the N-dimensional hypercube defined by v =0 or 1. N i Consider those corners of this space which are the local minima stable states) of the energy function E = 1 A 2 v v + B 2 C 2 ( v - n) xi j i xixj xi y x xiyi xi 2 xi where A, B, C are positive constants, v {0, 1}. i -- The first term is 0 iff each city row x contains no more than one 1.

22
-- The second term is 0 iff each position in tour column i contains no more than one The third term is 0 iff there are n cities of 1 in the entire matrix. Thus, this energy function evaluated on the domain of the corners of the hypercube has minima with E = 0 for all stable matrices with one 1 in each row and column. All other states have higher energy. 1 Hence, including these terms in an energy function describing a TSP network strongly favors stable states which are at least valid tour in the TSP problem.

23
Another requirement, that E favors valid tours representing shout path, is fulfilled by adding one additional term to E. This term contains information about the length of the path corresponding to a given tour, and its form can be 1 E = D 2 2 d v (v + v ) x y x i xyxiy,i+1y,i-1 where subscripts are defined modulo n, in order to express easily end effects such as the fact that the n-th city on a tour is adjacent in the tour to both city (n-1) and city 1, i.e., v = v. Within the domain of states which characterizes a valid tour, E is numeric- ally equal to the length of the path for that tour. y,n+jy,j 2

24
If A, B, and C are sufficiently large, all the really low energy states of a network described by this function will have the form of a valid tour. The total energy of that state will be the length of the tour, and the states with the shortest path will be the lowest energy states. Using the row/column neuron labeling scheme described above for each of the two indicies, the implicitly defined connection matrix is T = - A ( ) inhibitory connections within each row - B inhibitory connections with each column - C global inhibition - D data term xyij xy j,i+1j,i-1 xi,yj The external input are I = 2C excitation bias xi n

25
The data term contribution, with D, to T is the input which describes which TSP problem (I.e., where the cities actually are) is to be solved. xi,yj The term with A, B, C provide the general constraints required for any TSP problem. The data term contribution controls which one of the n! Set of these properly constrained final states is actually chosen as the best path. The problem formulated as shown below has been solved numerically for the continuous activation function with =50, A=B=D=250, and C=100, for 10 n 30. Quite satisfactory solution has been found.

26
A B C D E F G H I J Path = D-H-I-F-G-E-A-J-C-B-D

27
§9.5 Associative Memory By properly selecting the weights, it is possible to make the stable states of the network just be the ones, M, we want to store. Under this condition, the networks state should not change if the network is initially in the state M; whereas if not in M, it is expected that the networks stable state should be the ones, in M, closest to the initial state (in the sense of Hamming distance). There are two categories of AM: 1) Auto-AM: If input x=x +v, where x {x, …, x }, then output y=x. x +v x

28
2) Hetero-AM:If x=x +v, where x y, …, x y stored then output y=y. x +v x y y One of the tasks is how to find the suitable weights such that the network perform a function of AM? The most frequently used rule for this purpose is the Outer Product Rule. Assume that -- Consider an n-neuron network; -- Each activity state x {-1, 1}; -- Hebbean rule is observed: w = x x, > 0. i ij ij

29
The outer product rule is as follows: For given vectors M={U, …, U }, where U =(x, …,x ) write 1mk t kk 1n W = (U U - I) = m k=1 kk t 0 x x x x x x 0 x x x x x x 0 m k= n nn n2 kkkk kkkk kkkk = x x 0 0 k=1 m m kk kk n1 1n

30
and this can be implemented by following procedure: (1) Set W = [0] (2) For k=1 to m, input U, do w = w + x x for all connected pair (i, j) Check to see if it is reasonable: 1) Suppose that U, …, U are orthogonal and m

31
i.e., U is exactly a stable state of the network, and W thus determined is reasonable. Example Given n=3, m=3 U =(1 1 -1), U =(-1 1 1), U =(1, -1 1) Thus 123 ttt U U -I = W = (U U -I) = 11 t t 33 t kk t k= WU = Sgn(WU ) = 1 1 = U 1 1

32
Similarly WU = Sgn(WU ) = = U WU = Sgn(WU ) = 1 1 = U 33 3 Clearly, U, U, and U are stable memories. The structure of the network is as below: u u u 3 2 1

33
Applications: a) Classification Given input U = (1 1 1), whether U {U, U, U }? xx123 WU = x -2 Sgn(WU ) = U x Hence U does not belong to the set {U, U, U }. 1x23 b) Associative Memory Given a noisy input U = (0 1 -1), what is U ? t x t x WU = x 0 1 Sgn(WU ) = x 1 1 = U. 1 U U. x 1

34
In general, if n-dimensional vector U, …, U are orthogonal, n>m, then 1m WU = (U U - I)U + (U U - I)U kkkkk tt ii m i=1 i k = U (U U - IU ) + (U U U - IU ) kkkkkk t tm i=1 ii i k = U n - U + (m-1)(-IU ) = (n-1)U - (m-1)U = (n-m)U kkkkk k Sgn(WU ) = Sgn[(n-m)U ] = U, k = 1, 2, …, m kkk Hence {U } are stable states. k

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google