Presentation is loading. Please wait.

Presentation is loading. Please wait.

Self Organization: Hebbian Learning CS/CMPE 333 – Neural Networks.

Similar presentations


Presentation on theme: "Self Organization: Hebbian Learning CS/CMPE 333 – Neural Networks."— Presentation transcript:

1 Self Organization: Hebbian Learning CS/CMPE 333 – Neural Networks

2 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS2 Introduction So far, we have studied neural networks that learn from their environment in a supervised manner Neural networks can also learn in an unsupervised manner as well. This is also known as self organized learning Self organized learning discovers significant features or patterns in the input data through general rules that operate locally Self organizing networks typically consist of two layers with feedforward connections and elements to facilitate ‘local’ learning

3 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS3 Self-Organization “Global order can arise from local interactions” – Turing (1952) Input signal produces certain activity patterns in network weights are modified (feedback loop) Principles of self organization 1. Modification in weights tend to self-amplify 2. Limitation of resources leads to competition and selection of the most active synapse and disregard of less active synapse 3. Modifications in weights tends to cooperate

4 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS4 Hebbian Learning A self-organizing principle was proposed by Hebb in 1949 in the context of biological neurons Hebb’s principle  When a neuron repeatedly excites another neuron, then the threshold of the latter neuron is decreased, or the synaptic weight between the neurons is increased, in effect increasing the likelihood of the second neuron to excite Hebbian learning rule Δw ji = ηy j x i  There is no desired or target signal required in the Hebbian rule, hence it is unsupervised learning  The update rule is local to the weight

5 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS5 Hebbian Update Consider the update of a single weight w (x and y are the pre- and post-synaptic activities) w(n + 1) = w(n) + ηx(n)y(n) For a linear activation function w(n + 1) = w(n)[1 + ηx 2 (n)]  Weights increase without bounds. If initial weight is negative, then it will increase in the negative. If it is positive, then it will increase in the positive range Hebbian learning is intrinsically unstable, unlike error- correction learning with BP algorithm

6 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS6 Geometric Interpretation of Hebbian Learning Consider a single linear neuron with p inputs y = w T x = x T w and Δw = η[x 1 y x 2 y … x p y] T The dot product can be written as y = |w||x| cos(α)  α = angle between vectors x and w  If α is zero (x and w are ‘close’) y is large. If α is 90 (x and w are ‘far’) y is zero.

7 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS7 Similarity Measure A network trained with Hebbian learning creates a similarity measure (the inner product) in its input space according to the information contained in the weights  The weights capture (memorizes) the information in the data during training During operation, when the weights are fixed, a large output y signifies that the present input is "similar" to the inputs x that created the weights during training Similarity measures  Hamming distance  Correlation

8 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS8 Hebbian Learning as Correlation Learning Hebbian learning (pattern-by-pattern mode) Δw(n) = ηx(n)y(n) = ηx T (n)x(n)w(n) Using batch mode Δw(n) = η[Σ n=1 N x(n)x T (n)]w(0) The term Σ n=1 N x(n)x T (n) is sample approximation of the auto-correlation of the input data  Thus Hebbian learning can be thought of learning the auto- correlation of the input space Correlation is a well-known operation in signal processing and statistics. In particular, it completely describes signals defined by Gaussian distributions  Applications in signal processing

9 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS9 Oja’s Rule The simple Hebbian rule causes the weights to increase (or decrease) without bounds The weights need to be normalized to one as w ji (n + 1) = [w ji (n) + ηx i (n)y j (n)] / √Σ i [w ji (n) + ηx i (n)y j (n)] 2  This equation effectively imposes a constraint on the weights that the sum at a neuron be equal to 1 Oja approximated the normalization (for small η) as: w ji (n + 1) = w ji n) + ηy j (n)[x i (n) – y j (n)w ji (n)]  This is Oja’s rule, or the generalized Hebbian rule  It involves a ‘forgetting term’ that prevents the weights from growing without bounds

10 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS10 Oja’s Rule – Geometric Interpretation The simple Hebbian rule finds the weight vector with the largest variance with the input data.  However, the magnitude of the weight vector increases without bounds Oja’s rule has a similar interpretation; normalization only changes the magnitude while the direction of the weight vector is same  Magnitude is equal to one Oja’s rule converges asymptotically, unlike Hebbian rule which is unstable

11 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS11

12 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS12 The Maximum Eigenfilter A linear neuron trained with Oja’s rule produces a weight vector that is the eigenvector of the input auto correlation matrix, and produces at its output the largest eigenvalue A linear neuron trained with Oja’s rule solves the following eigen problem Re 1 = λ 1 e 1  R = auto-correlation matrix of input data  e 1 = largest eigenvector which corresponds to the weight vector w obtained by Oja’s rule  λ 1 = largest eigenvalue, which corresponds to the network’s output

13 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS13 Principal Component Analysis (1) Oja’s rule when applied to a single neuron creates a principal component in the input space in the form of the weight vector How can we find other components in the input space with significant variance ? In statistics, PCA is used to obtain the significant components of data in the form of orthogonal principal axes  PCA is also known as K-L filtering in signal processing  First proposed in 1901. Later developments occurred in the 1930s, 1940s and 1960s. Hebbian network with Oja’s rule can perform PCA

14 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS14 Principal Component Analysis (2) PCA Consider a set of vectors x with zero mean and unit variance. There exist an orthogonal transformation y = Q T x such that the covariance matrix of y is Λ = E[yy T ] Λ ij = λ i if i = j and Λ ij = 0 otherwise (diagonal matrix)  λ 1 > λ 2 > … > λ p = eigenvalues of covariance matrix of x (C = E[xx T ]  Columns of Q are the corresponding eigenvectors  Vector y is the principal component that has the maximum variance with all other components

15 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS15 PCA – Example

16 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS16 Hebbian Network for PCA

17 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS17 Hebbian Network for PCA Procedure  Use Oja’s rule to find the principal component  Project the data orthogonal to the principal component  Use Oja’s rule on the projected data to find the next major component  Repeat the above for m <= p (m = desired components; p = input space dimensionality) How to find the projection onto orthogonal direction?  Deflation method: subtract the principal component from the input Oja’s rule can be modified to perform this operation; Sanger’s rule

18 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS18 Sanger’s Rule Sanger’s rule is a modification of Oja’s rule that implements the deflation method for PCA  Classical PCA involves matrix operations  Sanger’s rule implements PCA in an iterative fashion for neural networks Consider p inputs and m outputs, where m < p y j (n) = Σ i=1 p w ji (n)x i (n)j = 1, m and, the update (Sanger’s rule) Δw ji (n) = η[y j (n)x i (n) – y j (n) Σ k=1 j w ki (n)y k (n)]

19 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS19 PCA for Feature Extraction PCA is the optimal linear feature extractor. This means that there is no other linear system that is able to provide better features for reconstruction.  PCA may or may not be the best preprocessing for pattern classification or recognition. Classification requires good discrimination which PCA might not be able to provide. Feature extraction: transform p-dimensional input space to an m-dimensional space (m < p), such that the m-dimensions capture the information with minimal loss The error e in the reconstruction is given by: e 2 = Σ i=M+1 p λ i

20 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS20 PCA for Data Compression PCA identifies an orthogonal coordinate system for the input data such that the variance of the projection on the principal axis is largest, followed by the next major axis, and so on By discarding some of the minor components, PCA can be used for data compression, where a p-dimension (bit) input is encoded in a m < p dimensional space Weights are computed by Sanger’s rule on typical inputs The de-compressor (receiver) must know the weights of the network to reconstruct the original signal x’ = W T y

21 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS21 PCA for Classification (1) Can PCA enhance classification ? In general, no. PCA is good for reconstruction and not feature discrimination or classification

22 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS22 PCA for Classification (2)

23 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS23 PCA – Some Remarks Practical uses of PCA  Data Compression  Cluster analysis  Feature extraction  Preprocessing for classification/recognition (e.g. preprocessing for MLP training) Biological basis  It is unlikely that the processing performed by biological neurons in, say perception, involves PCA only. More complex feature extraction processes are involved.

24 CS/CMPE 333 - Neural Networks (Sp 2002/2003) - Asim Karim @ LUMS24 Anti-Hebbian Learning Modifying the Hebbian rule as Δw ji (n) = - ηx i (n)y j (n) The anti-Hebbian rule find the direction in space that has the minimum variance. In other words, it is the complement of the Hebbian rule  Anti-Hebbian does de-correlation. It de-correlates the output from the input Hebbian rule is unstable, since it tries to maximize the variance. Anti-Hebbian rule, on the other hand, is stable and converges


Download ppt "Self Organization: Hebbian Learning CS/CMPE 333 – Neural Networks."

Similar presentations


Ads by Google