of the Artificial Neural Networks.

of the Artificial Neural Networks.
Topic 3. Learning Rules of the Artificial Neural Networks.

Multilayer Perceptron.
The first layer is the input layer, and the last layer is the output layer. All other layers with no direct connections from or to the outside are called hidden layers.

The input is processed and relayed from one layer to the next, until the final result has been computed. This process represents the feedforward scheme.

structural credit assignment problem: when an error is made at the output of a network, how is credit (or blame) to be assigned to neurons deep within the network? One of the most popular techniques to train the hidden neurons is error backpropagation, whereby the error of output units is propagated back to yield estimates of how much a given hidden unit contributed to the output error.

The error function of multilayer perceptron: The best performance of the network corresponds to the minimum of the total squared error, and during the network training, we adjust the weights of connections in order to get to that minimum.

Combination of the weights, including that of hidden neurons, which minimises the error function E is considered to be a solution of multiple layer perceptron learning problem .

The error function of multilayer perceptron: The backpropagation algorithm looks for the minimum of the multi-variable error function E in the space of weights of connections w using the method of gradient descent.

Following calculus, a local minimum of a function of two or more variables is defined by equality to zero of its gradient: where is partial derivative of the error function E with respect to the weight of connection between h-th unit in the layer k and t-th unit in the previous layer number k-1.

We would like to go in the direction opposite to to most rapidly minimise E. Therefore, during the iterative process of gradient descent each weight of connection, including the hidden ones, is updated: using the increment here C represents the learning rate.

where Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the error function E be a differentiable function

Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the error function E be a differentiable function, which requires the network output Xjp to be differentiable, which requires the activation functions f(S) to be differentiable: where

Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the error function E be a differentiable function, which requires the network output Xjp to be differentiable, which requires the activation functions f(S) to be differentiable: This provides a powerful motivation for using continuous and differentiable activation functions f(w,a). where

Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the activation functions f(S) to be differentiable. To make a multiple layer perceptron to be “able to learn” here is a useful generic sigmoid activation function associated with a hidden or output neuron: where

Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the activation functions f(S) to be differentiable. To make a multiple layer perceptron to be “able to learn” here is a useful generic sigmoid activation function associated with a hidden or output neuron: Important thing about the generic sigmoid function is that it is differentiable, with a very simple and easy to compute derivative where

Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the activation functions f(S) to be differentiable. To make a multiple layer perceptron to be “able to learn” here is a useful generic sigmoid activation function associated with a hidden or output neuron: If all activation functions f(S) in the network are differentiable then, according to the chain rule of calculus, differentiating the error function E with respect to the weight of connection in consideration we can express the corresponding partial derivative of the error function where

Then…. where

where

where Thus, correction to the hidden weight of connection between h-th unit in the k-th layer and t-th unit in the previous (k-1)-th layer can be found by

Multilayer Perceptron Learning rule!!!
where The correction is defined by the output layer errors ejp, derivatives of activation functions of all neurons in the upper layers with numbers p > k, derivative of activation function of the neuron h itself in the layer k, activation function of connected neuron t in the previous layer (k-1).

Multilayer Perceptron Learning rule!!!
where We can easily measure the output errors of the network, and it is us to define all the activation functions. If we also know the derivatives of the activation functions, then we can easily find all the corrections to weights of connections of all neurons in the network, including the hidden ones, during the second run back through the network.

Multilayer Perceptron Training.
The training process of multilayer perceptron consists of two phases. Initial values of the weights of connections set up randomly. Then, during the first, feedforward phase, starting from the input layer and further layer-by-layer, outputs of every unit in the network are computed together with the corresponding derivatives. Figure: Directions of two basic signal flows in multilayer perceptron: forward propagation of function signals and back-propagation of error signals.

The training process of multilayer perceptron consists of two phases. Initial values of the weights of connections set up randomly. Then, during the first, feedforward phase, starting from the input layer and further layer-by-layer, outputs of every unit in the network are computed together with the corresponding derivatives. In the second, feedback phase corrections to all weights of connections of all units including the hidden ones are computed using the outputs and derivatives computed during the feedforward phase. Figure: Directions of two basic signal flows in multilayer perceptron: forward propagation of function signals and back-propagation of error signals.

To understand the second, error back-propagation phase of computing corrections to the weights, let us follow an example of a small three-layer perceptron. input layer hidden layer output layer Layer N 1 2 Unit N

To understand the second, error back-propagation phase of computing corrections to the weights, let us follow an example of a small three-layer perceptron. input layer hidden layer output layer Layer N 1 2 Unit N Suppose that we have found all outputs and corresponding derivatives of activation functions of all computing units including the hidden ones in the network.

input layer hidden layer output layer Layer N 1 2 Unit N We shall mark values of the layer in consideration, values of the layer previous to the one in consideration,

input layer hidden layer output layer Layer N 1 2 Unit N Weight of connection between unit number 1 (first lower index) in the output layer (layer number 2 shown as the upper index) and unit number 0 (second lower index) in the previous layer (number 1=2-1) after presentation of a training pattern would have a correction

input layer hidden layer output layer Layer N 1 2 Unit N Analogously, corrections to all six weights of connections between the output layer and the hidden layer are obtained as

Multilayer Perceptron Training. Corrections to hidden units connections.
input layer hidden layer output layer Layer N 1 2 Unit N We shall mark values of the layer in consideration, values of the layer previous to the one in consideration, values of the layers above the one in consideration,

input layer hidden layer output layer Layer N 1 2 Unit N Weight of connection between unit number 1 (first lower index) in the hidden layer (layer number 1 shown in the upper index) and unit number 0 in the previous input layer (second lower index) would have a correction

input layer hidden layer output layer Layer N 1 2 Unit N Analogously, for all six weights of connections between the hidden layer and the input layer:

input layer hidden layer output layer Layer N 1 2 Unit N In this way going backwards through the network, one obtain the corrections to all weights …,

input layer hidden layer output layer Layer N 1 2 Unit N In this way going backwards through the network, one obtain the corrections to all weights …, then update the weights.

input layer hidden layer output layer Layer N 1 2 Unit N In this way going backwards through the network, one obtain the corrections to all weights …, then update the weights. After that, with the new weights go forward to get new outputs…

input layer hidden layer output layer Layer N 1 2 Unit N In this way going backwards through the network, one obtain the corrections to all weights …, then update the weights. After that, with the new weights go forward to get new outputs… Find new error, go backwards and so on…

In this way going backwards through the network, one obtain the corrections to all weights …, then update the weights. After that, with the new weights go forward to get new outputs… Find new error, go backwards and so on… Hopefully, sooner or later the iterative procedure will come to output with the minimum error, i.e. the absolute minimum of the error function E.

In this way going backwards through the network, one obtain the corrections to all weights …, then update the weights. After that, with the new weights go forward to get new outputs… Find new error, go backwards and so on… Hopefully, sooner or later the iterative procedure will come to output with the minimum error, i.e. the absolute minimum of the error function E. Unfortunately, as a function of many variables, the error function might have more than one minimum, and one may get not to the absolute minimum but to a relative one.

Unfortunately, as a function of many variables, the error function might have more than one minimum, and one may get not to the absolute minimum but to a relative one. If it happens, the error function stops to decrease regardless of number of iteration. Some measures must be taken to get out of the function relative minimum, for example, adding small random values, i.e. “noise”, to one or more of the weights. Then the iterative procedure starts from that new point to get to the absolute minimum eventually.

Finally, after successful training, perceptron is able to produce the desired responses to all input patterns of the training set.

Finally, after successful training, perceptron is able to produce the desired responses to all input patterns of the training set. Then all the network weights of connections are fixed,

Finally, after successful training, perceptron is able to produce the desired responses to all input patterns of the training set. Then all the network weights of connections are fixed, and the network is presented with inputs it must “recognise”, i.e. not the training set inputs.

Finally, after successful training, perceptron is able to produce the desired responses to all input patterns of the training set. Then all the network weights of connections are fixed, and the network is presented with inputs it must “recognise”, i.e. not the training set inputs. If an input in consideration produces an output similar to one of the training set, such input is said to belong to the same type or cluster of inputs as the corresponding one of the training set.

Then all the network weights of connections are fixed, and the network is presented with inputs it must “recognise”, i.e. not the training set inputs. If an input in consideration produces an output similar to one of the training set, such input is said to belong to the same type or cluster of inputs as the corresponding one of the training set. If the network produces an output not similar to any of the training set, then such an input is said not been recognised.

Multilayer Perceptron Training. Conclusion.
In 1969 Minsky and Papert not just found the solution to the XOR problem in a form of multilayer perceptron, they also gave a very thorough mathematical analysis of the time it takes to train such networks. Minsky and Papert emphasized that training times increase very rapidly for certain problems as the number of input lines and weights of connections increases.

Minsky and Papert emphasized that training times increase very rapidly for certain problems as the number of input lines and weights of connections increases. The difficulties were seized upon by opponents of the subject. In particular, this was true of those working in the field of artificial intelligence (AI), who at that time did not want to concern themselves with the underlying “wetware” of the brain, but only with the functional aspects – regarded by them solely as logical processing. Due to the limitations of funding, competition between AI and neural network communities could have only one victor.

Due to the limitations of funding, competition between AI and neural network communities could have only one victor. Neural networks then went into a relative quietude for more then fifteen years, with only a few devotees still working on it.

Due to the limitations of funding, competition between AI and neural network communities could have only one victor. Neural networks then went into a relative quietude for more then fifteen years, with only a few devotees still working on it. Then new vigour came from various sources. One was from the increasing power of computers, allowing simulations of otherwise intractable problems.

New vigour came from various sources. One was from the increasing power of computers, allowing simulations of otherwise intractable problems. Finally, established by the mid 80s the backpropagation algorithm solved the difficulty of training hidden neurons.

New vigour came from various sources. One was from the increasing power of computers, allowing simulations of otherwise intractable problems. Finally, established by the mid 80s the backpropagation algorithm solved the difficulty of training hidden neurons. Nowadays, Perceptron is an effective tool for recognising protein and amino-acid sequences and processing other complex biological data.

of the Artificial Neural Networks.

Similar presentations

Presentation on theme: "of the Artificial Neural Networks."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

of the Artificial Neural Networks.

Similar presentations

Presentation on theme: "of the Artificial Neural Networks."— Presentation transcript:

Similar presentations

About project

Feedback