Feedback Neural Networks

Slides:



Advertisements
Similar presentations
Feed-forward Networks
Advertisements

CSE 211 Discrete Mathematics
Bioinspired Computing Lecture 16
Chapter3 Pattern Association & Associative Memory
The Game of Algebra or The Other Side of Arithmetic The Game of Algebra or The Other Side of Arithmetic © 2007 Herbert I. Gross by Herbert I. Gross & Richard.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
5.1 Real Vector Spaces.
Chapter 4 Euclidean Vector Spaces
Computational Intelligence
Introduction to Neural Networks Computing
8.3 Representing Relations Connection Matrices Let R be a relation from A = {a 1, a 2,..., a m } to B = {b 1, b 2,..., b n }. Definition: A n m  n connection.
PROCESS MODELLING AND MODEL ANALYSIS © CAPE Centre, The University of Queensland Hungarian Academy of Sciences Analysis of Dynamic Process Models C13.
Some Ideas Behind Finite Element Analysis
CS 678 –Relaxation and Hopfield Networks1 Relaxation and Hopfield Networks Totally connected recurrent relaxation networks Bidirectional weights (symmetric)
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Applied Discrete Mathematics Week 12: Trees
Simple Neural Nets For Pattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 5 Orthogonality
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Algorithms All pairs shortest path
18 1 Hopfield Network Hopfield Model 18 3 Equations of Operation n i - input voltage to the ith amplifier a i - output voltage of the ith amplifier.
Hypercubes and Neural Networks bill wolfe 10/23/2005.
November 30, 2010Neural Networks Lecture 20: Interpolative Associative Memory 1 Associative Networks Associative networks are able to store a set of patterns.
Energy function: E(S 1,…,S N ) = - ½ Σ W ij S i S j + C (W ii = P/N) (Lyapunov function) Attractors= local minima of energy function. Inverse states Mixture.
Linear Programming Applications
December 7, 2010Neural Networks Lecture 21: Hopfield Network Convergence 1 The Hopfield Network The nodes of a Hopfield network can be updated synchronously.
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Chapter 6 Associative Models. Introduction Associating patterns which are –similar, –contrary, –in close proximity (spatial), –in close succession (temporal)
Artificial Neural Networks
WEEK 8 SYSTEMS OF EQUATIONS DETERMINANTS AND CRAMER’S RULE.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
NEURAL NETWORKS FOR DATA MINING
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Hebbian Coincidence Learning
Neural Networks and Fuzzy Systems Hopfield Network A feedback neural network has feedback loops from its outputs to its inputs. The presence of such loops.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Simultaneous Recurrent Neural Networks for Static Optimization Problems By: Amol Patwardhan Adviser: Dr. Gursel Serpen August, 1999 The University of.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Ming-Feng Yeh1 CHAPTER 16 AdaptiveResonanceTheory.
Activations, attractors, and associators Jaap Murre Universiteit van Amsterdam
Optimization with Neural Networks Presented by: Mahmood Khademi Babak Bashiri Instructor: Dr. Bagheri Sharif University of Technology April 2007.
Word : Let F be a field then the expression of the form a 1, a 2, …, a n where a i  F  i is called a word of length n over the field F. We denote the.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Activations, attractors, and associators Jaap Murre Universiteit van Amsterdam en Universiteit Utrecht
Hopfield Neural Networks for Optimization 虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 13 Backtracking Introduction The 3-coloring problem
CS623: Introduction to Computing with Neural Nets (lecture-12) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Neural Network to solve Traveling Salesman Problem Amit Goyal Koustubh Vachhani Ankur Jain 01D05007.
Computational Intelligence Winter Term 2015/16 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
(CSC 102) Lecture 30 Discrete Structures. Graphs.
Assocative Neural Networks (Hopfield) Sule Yildirim 01/11/2004.
Lecture 39 Hopfield Network
J. Kubalík, Gerstner Laboratory for Intelligent Decision Making and Control Artificial Neural Networks II - Outline Cascade Nets and Cascade-Correlation.
Chapter 6 Associative Models
Ch7: Hopfield Neural Model
Numerical Analysis Lecture 16.
Hopfield Network.
Recurrent Networks A recurrent network is characterized by
A Dynamic System Analysis of Simultaneous Recurrent Neural Network
Lecture 39 Hopfield Network
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Hopfield Neural Networks for Optimization
Computational Intelligence
Computational Intelligence
Presentation transcript:

Feedback Neural Networks AI & NN Notes Chapter 9 Feedback Neural Networks

§9.1 Basic Concepts Attractor: a state toward which the system evolves in time starting from certain initial conditions. Basin of attraction: the set of initial conditions which initiates the evolution terminating in the attractor. Fixed point: if an attractor is in a form of a unique point in state space. Limit Cycle: if an attractor consists of a periodic sequence of states.

Hopfield Network and its Basic Assumptions 1 1. 1 layer, n neurons 2. T -- Threshold of neuron i 3. w -- weight from j to i 4. v -- output of neuron j 5. i -- external input to the i-th neuron v i 1 w 1 w 1 21 12 i v 2 w i 2 n1 2 w ij n2 T w 2 1n w n 2n j v i n n T n

The total input of the i-th neuron is net =  n t w v + i - T = W V + i - T , for I=2, …, n i i i i ij j i i J=1 ji where w v i1 1 w v W = i2 V = 2 . i . . w v n in

The complete matrix description of the linear portion of the system shown in the figure is given by net = WV + i - t where net i T 1 1 1 net T i 2 i = t = 2 net = 2 net i T n n n are vectors containing activation, external input to each neuron and threshold vector, respectively.

W is an nn matrix containing network weights: w w w 1 12 13 1n w t w w w = W = 2 21 23 2n w = w w w w ij ji t 31 32 3n w n w w w n1 n2 n3 and w = 0 ii

§9.2 Discrete-Time Hopfield Network Assuming that the neuron’s activation function is sgn, the transition rule of the i-th neuron would be -1, if net < 0 (inhibited state) v  (*) +1, if net > 0 (excitatory state) If, for a given time, only a single neuron is allowed to update its output and only one entry in vector v is allowed to change, this is an asynchronous operation, under which each element of the output vector is updated separately while taking into account the most recent values for the elements that have already been updated and remain stable.

Based on (*), the update rule of a discrete-time recurrent network, for one value of i at a time, becomes K+1 t k v = sgn(w v + i - T ) for random i, i=1, 2, …, n and k=0, 1, 2, ... i i i i where k denotes the index of recursive update. This is referred as asynchronous stochastic recursion of the Hopfield network. This update process will continue until all n entries of v have been updated. The recursive computation continues until the output node vector remains unchanged with further iterations. Similarly, for synchronous operation, we have K+1 i K+1 v = T[Wv + i - t], for all neurons, k=0, 1, ... k where all neurons change their output simultaneously.

Geometrical Explanation The output vector v is one of the vertices of the n- dimensional cube [-1, 1] in E space. The vector moves during recursions from vertex to vertex, until it is stabilizes in one of the 2 vertices available. The movement is from a vertex to an adjacent vertex since the asynchronous update mode allows for a single-component update of an n-tuple vector at a time. The final position of v as k, is determined by weights, thresholds, inputs, and the initial vector v as well as the order of transitions. n n n

To evaluate the stability property of the dynamical system of interest, the computational energy function is defined in n-dimensional output space v . If the increments of a certain bounded positive-valued computational energy function under the transition rule are found to be non-positive, then the function can be called a Lyapunov function, and the system would be asymptotically stable. The scalar-valued energy function for the discussed system is a quadratic form: n t t t E = - 1/2 V WV - i V + t V

The energy function in asynchronous mode. or E = - 1/2   w v v -  i v +  t v n n n n i j i i i i ij i=1 j=1 ji i=1 i=1 The energy function in asynchronous mode. Assume that the output node I has been updated at the k-th instant so that v - v = v . Computing the energy gradient vector: k+1 k i i t t  E = - 1/2 (W + W) v - i + t = - Wv - i + t t t t v W = W t The energy increment becomes t t  E = ( E) v = (-W v - i + t ) v t t i i i i This is because only the i-th output is updated.

This can be rewritten as Therefore we have ( v) = [0 … v … 0] t i This can be rewritten as n  E = - (  w v + i - t ) v for j i, ij j i i i j=1 or briefly  E = - net v i i Note that when net < 0, then  v  0 when net > 0, then  v  0 thus (net v ) is always non-negative. In other words, any corresponding energy changes E are non-positive provided that w = w . i i i i i i ji ij

Further we can show that the non-increasing energy function has a minimum. Since W is indefinite because of its zero diagonal, then E has neither a minimum nor maximum in unconstrained output space. However, E is obviously bounded in n-dimensional space consisting of the 2 vertices of n-dimensional cube, Thus, E has to reach its minimum finally under the update algorithm. n Example of recursive asynchronous update of computed digit 4:

(b) (a) (c) (d) (e) where (a) k=0, (b) k=1, (c) k=2, (d) k=3, (e) k=4. The initial map is a destroyed digit 4 with 20% of the pixels randomly reversed. For k>4, no changes are produced at the network output since the system arrived at one of its stable states.

§9.3 Gradient-Type Hopfield Network Consider the continuous-time single-layer feedback networks. One of its model is given below. i i i i w n 2 3 1 1n w 32 w n1 g g g g u 1 2 3 n 2 u u n 1 u 3 c c c c 1 2 3 n v v v v 1 2 3 n v v v v 2 3 n 1

It consists of n neurons, each mapping its input u into the output v through the activation function f(u ). Conductance w connects the output of the j-th neuron to the input of the i-th neuron. It is also assumed that w = w and w = 0. The KCL equation for the input node having potential u can be obtained as i i i ij ij ji ii i du n n i i +  w v - u (  w + g ) = C dt i ij j i ij i i j=1 jj j=1 jj n Defining G =  w + g , C = diag[C , …, C ], G = diag[G , …, G ] 1 n j=1 jj ij i n 1

It follows thus that the change of E, in time, are in the Then we have du(t) C = Wv(t) - Gu(t) + i and v(t) = f(u(t)) dt It can be shown that dE du t dv = - (c ) < 0 dt dt dt It follows thus that the change of E, in time, are in the general direction toward lower values of he energy function in v space -- the stability condition. n

§9.4 Feedback Networks for Computational Applications In principle, any optimization problems whose objective function can be expressed in the form of energy function can be solved by feedback networks convergence. Take the Traveling Salesman Problem as an example. Given a set of n cities A, B, C, … with pairwise distances d , d , … try to find a close tour which visits each city once, returns to the starting city and has a minimum total path length. AB AC This is an NP-complete problem.

To map this problem onto the computational network, we require a representation scheme, in which the final location of any individual city is specified by the output states of a set of n neurons. E.g., n=5, the neuronal state (5 neurons) shown below would represent a tour: 2 Order 1 2 3 4 5 City A B C D E 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0

In the nn square representation, this means that in an output state describing a valid tour there can be only one “1” in each row and each column, all other entries being “0”. 2 In this scheme, the n symbols v will be described by double indicies, v : x stands for city name, j for the position of that city in tour. i xj To enable the N neurons to compute a solution to the problem, the network must be described by an energy function in which the lowest energy state corresponds to the best path of the tour.

An appropriate form for this function can be found by considering the high gain limit, in which all final normal output will be 0 or 1. The space over which the energy function is minimized in this limit is the 2 corners of the N-dimensional hypercube defined by v =0 or 1. N i Consider those corners of this space which are the local minima stable states) of the energy function B C A 2    v v +    v v + (  v - n) E = 2 2 2 xi xj xi yi xi 1 x i ji i x yx x i where A, B, C are positive constants, v  {0, 1}. i -- The first term is 0 iff each city row x contains no more than one “1”.

-- The second term is 0 iff each position in tour column i contains no more than one “1”. -- The third term is 0 iff there are n cities of “1” in the entire matrix. Thus, this energy function evaluated on the domain of the corners of the hypercube has minima with E = 0 for all stable matrices with one “1” in each row and column. All other states have higher energy. 1 Hence, including these terms in an energy function describing a TSP network strongly favors stable states which are at least valid tour in the TSP problem.

Another requirement, that E favors valid tours representing shout path, is fulfilled by adding one additional term to E . This term contains information about the length of the path corresponding to a given tour, and its form can be 1 D E =    d v (v + v ) 2 2 xy xi y,i+1 y,i-1 x yx i where subscripts are defined modulo n, in order to express easily “end effects” such as the fact that the n-th city on a tour is adjacent in the tour to both city (n-1) and city 1, i.e., v = v . Within the domain of states which characterizes a valid tour, E is numeric- ally equal to the length of the path for that tour. y,n+j y,j 2

If A, B, and C are sufficiently large, all the really low energy states of a network described by this function will have the form of a valid tour. The total energy of that state will be the length of the tour, and the states with the shortest path will be the lowest energy states. Using the row/column neuron labeling scheme described above for each of the two indicies, the implicitly defined connection matrix is T = - Ad (1-d ) “inhibitory connections within each row” - Bd (1-d )“inhibitory connections with each column” - C “global inhibition” - D (d + d ) “data term” xi,yj xy ij ij xy xy j,i+1 j,i-1 The external input are I = 2C “excitation bias” n xi

The “data term” contribution, with D, to T is the input which describes which TSP problem (I.e., where the cities actually are) is to be solved. xi,yj The term with A, B, C provide the general constraints required for any TSP problem. The “data term” contribution controls which one of the n! Set of these properly constrained final states is actually chosen as the “best” path. The problem formulated as shown below has been solved numerically for the continuous activation function with l =50, A=B=D=250, and C=100, for 10  n 30. Quite satisfactory solution has been found.

Path = D-H-I-F-G-E-A-J-C-B-D 1 2 3 4 5 6 7 8 9 10

By properly selecting the weights, it is possible to §9.5 Associative Memory By properly selecting the weights, it is possible to make the stable states of the network just be the ones, M, we want to store. Under this condition, the network’s state should not change if the network is initially in the state M; whereas if not in M, it is expected that the network’s stable state should be the ones, in M, closest to the initial state (in the sense of Hamming distance). There are two categories of AM: 1) Auto-AM: If input x’=x +v, where x {x , …, x }, then output y=x . a a 1 M a a x +v a x

2) Hetero-AM:If x’=x +v, where x y , …, x y stored then output y=y . One of the tasks is how to find the suitable weights such that the network perform a function of AM? The most frequently used rule for this purpose is the Outer Product Rule. Assume that -- Consider an n-neuron network; -- Each activity state x {-1, 1}; -- Hebbean rule is observed: w = x x , > 0. i a a ij i j

The outer product rule is as follows: For given vectors M={U , …, U }, where U =(x , …,x ) write t k k 1 m k 1 n k k k k 0 x x x x x x 0 x x x x x x 0 1 2 1 n k k k k m m t 2 1 2 n W =  (U U - I) =  k k k=1 k=1 k k k k n 1 n 2 m k k  x x 1 n k=1 = m k k  x x n 1 k=1

and this can be implemented by following procedure: (1) Set W = [0] (2) For k=1 to m, input U , do w = w + x x for all connected pair (i, j) Check to see if it is reasonable: 1) Suppose that U , …, U are orthogonal and m<n k k k ij ij i j 1 m m t t WU = (U U - I )U +  (U U - I )U 1 1 1 1 k k 1 k=2 m t t = U U U - IU +  (U U U - IU ) 1 1 1 1 k k 1 1 k=2 m = U n - U +  ( -IU ) = U (n-1) - (m-1)U = (n-m) U 1 1 1 1 1 1 k=2 Hence Sgn(WU ) = Sgn[(n-m)U ] = U 1 1 1

i.e., U is exactly a stable state of the network, and W thus determined is reasonable. Example Given n=3, m=3 U =(1 1 -1), U =(-1 1 1), U =(1, -1 1) Thus t t t 1 2 3 0 1 -1 0 -1 -1 t t U U -I = 1 0 -1 U U -I = -1 0 1 2 2 1 1 -1 -1 0 -1 1 0 0 -1 1 0 -1 -1 t 3 t U U -I = -1 0 -1 W =  (U U -I) = -1 0 -1 3 3 k k 1 -1 0 k=1 -1 -1 0 1 = U WU = Sgn(WU ) = 1 1 1 1 -1 -2

Clearly, U , U , and U are stable memories. The Similarly -2 -1 WU = Sgn(WU ) = 1 = U 2 2 2 1 1 -1 = U WU = -2 Sgn(WU ) = 3 3 3 1 Clearly, U , U , and U are stable memories. The structure of the network is as below: u 2 -1 -1 -1 u u 1 3

Given input U = (1 1 1), whether U  {U , U , U }? Applications: a) Classification Given input U = (1 1 1), whether U  {U , U , U }? t x x 1 2 3 -2 -1 WU = -2 Sgn(WU ) = -1  U x x -2 -1 Hence U does not belong to the set {U , U , U }. x 1 2 3 b) Associative Memory t Given a noisy input U = (0 1 -1) , what is U ? x x 1 WU = 1 Sgn(WU ) = 1 = U . U  U . x x 1 x -1 1 -1

In general, if n-dimensional vector U , …, U are orthogonal, n>m, then 1 m m t t WU = (U U - I)U +  (U U - I)U k k k k i i k i=1 ik m t t = U (U U - IU ) +  (U U U - IU ) k k k k i i k k i=1 ik = U n - U + (m-1)(-IU ) = (n-1)U - (m-1)U = (n-m)U k k k k k k Sgn(WU ) = Sgn[(n-m)U ] = U , k = 1, 2, …, m k k k Hence {U } are stable states. k