 # NEURAL NETWORKS Backpropagation Algorithm

## Presentation on theme: "NEURAL NETWORKS Backpropagation Algorithm"— Presentation transcript:

NEURAL NETWORKS Backpropagation Algorithm
Dear students, todays topic is the Backpropagation Algorithm .The steps and critical point to be considered are summarized in this lesson. PROF. DR. YUSUF OYSAL

Backpropagation Algorithm
NEURAL NETWORKS – Backpropagation Algorithm Backpropagation Algorithm Backpropagation Algorithm has two phases: Forward pass phase: computes ‘functional signal’, feed forward propagation of input pattern signals through network Backward pass phase: computes ‘error signal’, propagates the error backwards through network starting at output units (where the error is the difference between actual and desired output values) Backpropagation Algorithm has two phases: Forward pass phase: computes ‘functional signal’, feed forward propagation of input pattern signals through network Backward pass phase: computes ‘error signal’, propagates the error backwards through network starting at output units (where the error is the difference between actual and desired output values)

Backpropagation Algorithm
NEURAL NETWORKS – Backpropagation Algorithm Backpropagation Algorithm Step 1: Determine the architecture how many input and output neurons; what output encoding ? hidden neurons and layers. Step 2: Initialize all weights and biases to small random values, typically ∈ [-1,1], choose a learning rate η. Step 3: Repeat until termination criteria satisfied Present a training example and propagate it through the network (forward pass) Calculate the actual output Inputs applied Multiplied by weights Summed ‘Squashed’ by sigmoid activation function Output passed to each neuron in next layer Adapt weights starting from the output layer and working backwards (backward pass) Here are the steps of the algorithm: Step 1: Determine the architecture how many input and output neurons; what output encoding hidden neurons and layers Step 2: Initialize all weights and biases to small random values, typically select between minus 1 and 1, choose a learning rate nu. Step 3: Repeat until termination criteria satisfied Present a training example and propagate it through the network (forward pass) Calculate the actual output: Inputs applied,and then multiplied by weights and summed, then squashed by sigmoid activation function. Finally output passed to each neuron in next layer Adapt weights starting from the output layer and working backwards (backward pass)

Parameter Update Rules
NEURAL NETWORKS – Backpropagation Algorithm Parameter Update Rules wpq (t) - weight from node p to node q at time t Weight change Error propagation for output neuron i Error propagation for hidden neuron j (the sum is over the i nodes in the layer above the node j) Here are the parameter update rules of the backpropagation algorithm. W p q t is the weight from node p to node q at time t. It is updated by the steepest descent formula. The weight change is a function of the error propagation delta. Previous week, error propagation is shown by the epsilon symbol, but most of the books in the literatüre uses delta symbol. Error propagation formulas for output neuron i and for hidden neuron j are summarized here. In back propagation algorithm the stopping criteria is checked at the end of each epoch. The error (mean absolute or mean square) at the end of an epoch is below a threshold. All training examples are propagated and the mean (absolute or square) error is calculated. The threshold is determined heuristicly. But if not maximum number of epochs is reached. It typically takes hundreds or thousands of epochs for an NN to converge.

Backpropagation - Example
NEURAL NETWORKS – Backpropagation Algorithm Backpropagation - Example Training set p1 = [ ]T class 1 banana p2 = [ ]T class 2 orange Network architecture How many inputs? How many hidden neurons? Heuristic: n=(inputs+output_neurons)/2 How many output neurons? What encoding of the outputs? 10 for class 1, 01 for class 2 • Initial weights and learning rate Let’s η = 0.1 and the weights are set as in the figure. In this example training set contains two examples: one for class 1 banana and one for class 2 orange. First step of the algorithm is illustrated in this slide. Network architecture is determined by heuristicly. The classes in other words outputs are encoded by 10 for class 1, 01 for class 2. Initial weights are set as in the figure. The learning rate is selected as 0 point 1.

Backpropagation - Example
NEURAL NETWORKS – Backpropagation Algorithm Backpropagation - Example 1. Forward pass for example 1 calculate the outputs o6 and o7 o1=0.6, o2=0.1, target output 1 0, i.e. class 1 Activations of the hidden units: net3= o1*w13+ o2*w23+b3=0.6* *(-0.2)+0.1=0.14 o3=1/(1+e-net3) =0.53 net4= o1 *w14+ o2*w24+b4=0.6*0+0.1* =0.22 o4=1/(1+e-net4) =0.55 net5= o1 *w15+ o2*w25+b5=0.6* *(-0.4)+0.5=0.64 o5=1/(1+e-net5) =0.65 Activations of the output units: net6= o3 *w36+ o4*w46+ o5*w56 +b6=0.53*(-0.4)+0.55* * =0.13 o6=1/(1+e-net6) =0.53 net7= o3 *w37+ o4*w47+ o5*w57 +b7=0.53* *(-0.1)+0.65*(-0.2)+0.6=0.52 o7=1/(1+e-net7) =0.63 Here are the forward pass calculations of the first example in the training set. The neural network outputs are calculated here finally.

Backpropagation - Example
NEURAL NETWORKS – Backpropagation Algorithm Backpropagation - Example 2. Backward pass for example 1 Calculate the output errors δ6 and δ7 (note that d6=1, d7=0 for class 1) δ6 = (d6-o6) * o6 * (1-o6)=(1-0.53)*0.53*(1-0.53)=0.12 δ7 = (d7-o7) * o7 * (1-o7)=(0-0.63)*0.63*(1-0.63)=-0.15 Calculate the new weights between the hidden and output units (η=0.1) Δw36= η * δ6 * o3 = 0.1*0.12*0.53=0.006 w36new = w36old + Δw36 = =-0.394 Δw37= η * δ7 * o3 = 0.1*-0.15*0.53=-0.008 w37new = w37old + Δw37 = =-0.19 Similarly for w46new, w47new, w56new and w57new For the biases b6 and b7 (remember: biases are weights with input 1): Δb6= η * δ6 * 1 = 0.1*0.12=0.012 b6new = b6old + Δb6 = =-0.012 Similarly for b7 Here are the backward pass calculations of the first example in the training set. The neural network weights are updated here finally.

Backpropagation - Example
NEURAL NETWORKS – Backpropagation Algorithm Backpropagation - Example Calculate the errors of the hidden units δ3, δ4 and δ5 δ3 = o3 * (1-o3) * (w36* δ6 +w37 * δ7 ) = 0.53*(1-0.53)(-0.4* *(-0.15))=-0.019 Similarly for δ4 and δ5 Calculate the new weights between the input and hidden units (η=0.1) Δw13= η * δ3 * o1 = 0.1*(-0.019)*0.6= w13new = w13old + Δw13 = =0.0989 Similarly for w23new, w14new, w24new, w15new and w25new; b3new, b4new and b6new 3. Repeat the same procedure for the other training examples • Forward pass for example 2, backward pass for example 2… • Note: it’s better to apply input examples in random order 4. At the end of the epoch – check if the stopping criteria is satisfied: if yes: stop training if not, continue training: epoch++ go to step 1 Repeat the same procedure for the other training examples. Forward pass for example 2, backward pass for example 2… Note that it’s better to apply input examples in random order. At the end of the epoch – check if the stopping criteria is satisfied: if yes: stop training if not, continue training: increase epoch number by one and go to step 1.

Backpropagation Algorithm
NEURAL NETWORKS – Backpropagation Algorithm Backpropagation Algorithm Not optimal - is guaranteed to find a minimum but it might be a local minimum! Backpropagation’s error space: many local and 1 global minimum => the generalized gradient descent may not find the global minimum If the algorithm converges to a local minimum,the trajectory is trapped in a valley and diverges from the optimal solution, try different initializations. If the algorithm is slow to converge as there are flat surfaces over the path, increase the learning rate or smooth out the trajectory by averaging the updates to the parameters Backpropagation algorithm may not converge to an optimal solution. It is guaranteed to find a minimum but it might be a local minimum! Backpropagation’s error space has many local and 1 global minimum points, so the generalized gradient descent may not find the global minimum If the algorithm converges to a local minimum, thus the trajectory is trapped in a valley and diverges from the optimal solution, then try different initializations. If the algorithm is slow to converge as there are flat surfaces over the path, then increase the learning rate or smooth out the trajectory by averaging the updates to the parameters

NEURAL NETWORKS – Backpropagation Algorithm
Overtraining Problem Based on them the network should be able to generalize what it has learned to the total population of examples. Overtraining (overfitting): the error on the training set is very small but when a new data is presented to the network, the error is high. => the network has memorized the training examples but has not learned to generalize to new situations! Reasons of overtraining: Training examples are noisy Number of the free parameters is bigger than the number of training examples Preventing Overtraining Use network that is just large enough to provide an adequate fit The network should not have more free parameters than there are training examples! Use an early stopping method K-fold cross validation may be used Based on them the network should be able to generalize what it has learned to the total population of examples. Overtraining (overfitting) is a problem of training neuarl networks. The error on the training set is very small but when a new data is presented to the network, the error is high. Thus the network has memorized the training examples but has not learned to generalize to new situations! Reasons of overtraining are training examples are noisy or mumber of the free parameters is bigger than the number of training examples. To prevent overtraining, the followings can be applied depending on the situaiton: Use network that is just large enough to provide an adequate fit. The network should not have more free parameters than there are training examples! Use an early stopping method. K-fold cross validation may be used.