ANN forms it’s output itself, according to the information, presented for input. We have to minimize some functional.After we have found this functional we have to minimize it. It is the main task, and according to this functional the input vector will be changed. In practice, adaptive networks code input information in the most compact way, of course according to some predefined requirements. Teaching without the tutor: introduction
Reducing the dimension of data with min loss Teaching without the tutor: redundancy of data The length of data description: Dimension of data = number of components of input vector Capacity of data = number of bits, defining the possible variety of all values Two ways of coding (reducing) the information Reducing the variety of data by detecting the prototypes finding of independent features Clustering and quantifying
Two ways to reduce the data Two ways to reduce the data Reducing the dimension allows us to describe the data with less components Clustering allows us to reduce the variety of data,reducing the number of bits,we need to describe the data. We can unite both types of algorithms. We can use Kohonen maps, when prototypes regulate in the space of low dimension. For example, input data can be reflected on to 2-dimensional grid of prototypes the way, you can visualize the data you have. NB
Main idea: neuron - indicator Neuron has one output and it’s teaching upon a d-dimension data Lets say that the activation function is linear. The output therefore is the linear combination of it’s outputs: The amplitude after the training is finished can be the indicator for the data. Showing rather the data corresponds for training patterns or not.
Hebb training algorithm According to Hebb: If we will reformulate the task as the optimization task we will get the property of such neuron and rule how to define functional we have to min: NB! If we wont to have minimum of the E than we will have an output amplitude equals to infinity
Oja training rule The member interfering was added to stop unlimited growth of weights Rule Oja maximizes sensitivity of an output neuron at the limited amplitude of weights. It is easy to be convinced of it, having equated average change of weights to zero. Having increased then the right part of equality on w. We are convinced, that in balance Thus, weights of trained neuron are located on hyper sphere : At training on Oja, a vector of weights settles down on hyper sphere, In a direction maximizing Projection of input vectors.
SUMMARY: Neuron is trying to reproduce the value of it’s input for known output. It means that it’s trying to maximize the sensitivity of it’s output neurons-indicators for many dimensional input information, doing compression this way. Oja training rule NB! The output of the Oja output layer is the linear combination of main components. If you want to receive main components you should change sum of all outputs:
The analysis of main components Lets say that we have d -dimensional data we are training m linear neurons:. We want an amplitude to be independent indicators of all output neurons, fully reflecting information about many-dimensional data we have. THE TASK IS:
The requirement: Neurons must interact somehow (if we will train them independently we will receive the same result for all of them) In simple case: Lets take perceptron with linear neuron for hidden layer, in which the number of inputs and outputs equals, and the weights with the same indexes in both layers are the same. Lets try to teach ANN to reproduce the input on the output. Training rule therefore: Looks like Oya training rule!
Self training layer: In our formulation the training of separate neuron, is trying to reproduce the inputs according to its outputs. Generalizing this note, it is logical to suggest a rule,according to which the value of outputs restoring according to whole output information. Doing this way we can get Oja training rule for one layer network: The hidden layer of such ANN, the same as Oya layer,makes optimal coding of input data, and contains maximum variety of data according to existing restrictions.
Example: Lets change activation function on the sigmoid in the training rule: Brings new property (Oja, et al, 1991). Such algorithm, in particular, was used for the decomposition of mixed signals with an unknown way (i.e. blind signal separation). For example this task we have when we want to separate human voice and noise.
Competition of neurons: the winner gets all Basis algorithm The training of competition layer remains constant: The winner: # of neuron winner The winner will be the neuron, which has the maximum response Training of winner:
The winner takes away not all One of variants of updating of a base rule of training of a competitive layer Consists in training not only the neuron-winner, but also its "neighbors", though and with In the smaller speed. Such approach - "pulling up" of the nearest to the winner neuron- It is applied in topographical Kohonen cards Function of the neighborhood is equal to unit for the neuron- -winner with an index And gradually falls down at removal from the neuron-winner Training on Kohonen reminds stretching an elastic grid of prototypes on Data file from training sample
Schematic representation of self-organizing network Methodology of self-organizing cards Neurons in the target layer are ordered and correspond to cells of a bi-dimensional card which can be painted by a principle of affinity of attributes Training on Kohonen reminds stretching an elastic grid of prototypes on Data file from training sample
The convenient tool of visualization Data is coloring topographical Cards, it is similar to how it do on Usual geographical cards. All attribute of data generates the coloring Cells of a card - on size of average value This attribute at the data who have got in given Cell. Visualization a topographical card, Induced by i-th component of entrance data Having collected together cards of all interesting Us of attributes, we shall receive topographical The atlas, giving integrated representation About structure of multivariate data.
Classified SOM for NASDAQ100 index for the period from 10-Nov-1997 till 27-Aug-2001 Methodology of self-organizing cards
Complexity of the algorithm When it’s better to use reducing of dimension, and when – quantifying of the input information? Reducing the dim Number of training patterns # of operations: quantifying number of syn weights of 1 layer ANN with d inputs & m output neurons Compression coef: Compression coef (b – capacity data) # of operations: Complexity: With the same compression coef:
JPEG example Image is divided on to 8x8 pixels, which should be input vectors, we want to reduce. In our case Lets propose that image contains gradation of the gray accuracy of the represented data But if d=64x64 than K>10 3