ERROR BACK-PROPAGATION LEARNING ALGORITHM

Name: ERROR BACK-PROPAGATION LEARNING ALGORITHM
Uploaded: 2017-08-13T14:19:54+00:00
Duration: PTM24S39
Channel: Estella Dorsey
Description: ERROR BACK-PROPAGATION LEARNING ALGORITHM

ERROR BACK-PROPAGATION LEARNING ALGORITHM
Zohreh B. Irannia

Single-Layer Perceptron
xi input vector t=c(x) is the target value o is the perceptron output  learning rate (a small constant ), assume =1 wi = wi + wi wi =  (t - o) xi Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Single-Layer Perceptron
Sigmoid-Function as Activation Function: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta Rule ? Gradient-Descent Delta-Rule Says: But WHY? Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Steepest Descent Method
(w1,w2) (w1+w1,w2 +w2) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Delta Rule ? Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Delta Rule define Finally Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Delta Rule j V j f( V j ) y j: Desired Target … Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Delta Rule So we have: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Perceptron learning problem
Only suitable if inputs are linearly separable Consider XOR-Problem: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Non linearly separable problems
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Solution: Multi-layer Networks
New Problem: How to train different layer weights in such networks? Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Idea of Error Back Propagation
Update of weights in output layer: Delta rule Delta rule is not applicable to hidden layers Because we don’t know the desired values for hidden nodes Solution: Propagating errors at output nodes back to hidden nodes Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Intuition by Illustration
3 layer / 2 inputs / 1 output: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Each neuron  composed of 2 units Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Training Starts through the input layer: The same happens for y2 and y3. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Propagation of signals through the hidden layer: The same happens for y5. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Propagation of signals through the output layer: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error signal of output layer neuron: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

propagate error signal back to all neurons. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

If propagated errors came from few neurons, they are added: The same happens for neuron-2 and neuron-3. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Weight updating starts: The same happens for all neurons. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Weight updating terminates in the last neuron: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Some Questions How often to update? After each training case? After a full sweep through the training data? How many epochs? How much to update? Use a fixed or variable learning rate? Is it true to use steepest descent method? Does it necessarily converge to global minimum? How long does it take to converge to some minimum? Etc. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Batch Mode Training Batch mode of weight updates: Weight update once per each epoch (cumulated over all P samples) Smoothing the training sample outliers Learning independent of the order of sample presentations Usually slower than in sequential mode Sometimes more likely to get stuck in local minima. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Major Problems of EBP Constant learning rate problems: Small  Slow convergence Large  Overshooting the minimum. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Steepest-Descent’s Problems
Convergence to Local Minima Local Minimum Global Minimum Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Steepest-Descent’s Problems
Slow Convergence (zigzag path) One solution: Steepest Descent Conjugate Gradient Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Modifications to EBP Learning

Speed It Up: Momentum Momentum Adds a percentage of the last movement to the current movement GD with Momentum Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Speed It Up: Momentum Weight change’s Direction: Combination of current gradient and previous gradient. Advantage: Reduce the role of outliers (Smooth search) But doesn’t adjust learning rate directly. (an indirect method) Disadvantages: May result to over-shooting. Not always reduce the number of iterations. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

But some problems remain!
Remaining problem: Equal learning rates for all weights ! Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Delta-bar-Delta Allows each weight to have its own learning rate. Lets learning rates vary with time. Two heuristics are used to determine appropriate changes : If weight changes is in the same direction for several time steps , learning rate for that weight should be increased. If direction of weight change alternates , the learning rate should be decreased. Note: these heuristics won’t always improve the performance. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Delta-bar-Delta Learning rate increase linearly and decrease exponentially. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Training Samples Quality & quantity of training samples  Quality of learning results. Samples must represent well the problem space: Random sampling Proportional sampling (prior knowledge of the problem) # of training patterns needed: There is no theoretically idea number. Baum and Haussler (1989): P = W/e W: total # of weights e: acceptable classification error rate If the net can be trained to correctly classify (1 – e/2)P of the P training samples, then classification accuracy of this net is 1 – e for input patterns drawn from the same sample space Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Activation Functions Sigmoid Activation Function: Saturation regions When some incoming weights become very large input to a node may fall into a saturation region during learning. Possible remedies: Use non-saturating activation functions. Periodically normalize all weights. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Activation Functions Another sigmoid function with slower saturation rate: Change the range of the logistic function from (0,1) to (a, b) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Activation Functions Change the slope of the logistic function Larger slope: Quicker to move to saturation regions // Faster convergence Smaller slope: Slow to move to saturation regions and allows refined weight adjustment // Slow convergence Larger slope Solution Adaptive slope (each node has a learned slope) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Practical Considerations
Many parameters must be carefully selected to ensure a good performance. Although the deficiencies of BP nets cannot be completely cured, some of them can be eased by some practical means. 2 important issues: Hidden Layers & Hidden Nodes Effect of Initial weights Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Hidden Layers & Hidden Nodes
Theoretically, one hidden layer (possibly with many hidden nodes) is sufficient for any functions. There is no theoretical results on minimum necessary # of hidden nodes Practical rule of thumb: n = # of input nodes; m = # of hidden nodes For binary/bipolar data: m = 2n For real data: m >> 2n Multiple hidden layers with fewer nodes may be trained faster for similar quality in some applications Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Effect of Initial Weights (and Biases)
Fully Random  like: [-0.05, 0.05], [-0.1, 0.1], [-1, 1] Problems: Small values  Slow learning. Large values  Go to saturation (f’(x) 0)  Slow learning. Normalize weights for hidden layer (Widrow) Random initial weights for all hidden nodes: [-0.5, 0.5] For each hidden node j, normalize its weight: m: # of input neurons n: # of hidden nodes For bias  choose a random value: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Effect of Initial Weights (and Biases)
Initialization of output weights shouldn’t result in small weighs. If small  “contribution of hidden layer neurons to the output error”, and “effect of the hidden layer weights” is not visible enough. If small, deltas (of hidden layer) become very small  small changes in the hidden layer weights. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

of different BP variants
NOW A Comparison of different BP variants Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Comparison of different BP variants
Four versions: BP Pattern Mode Learning Algorithm BP Batch Mode Learning Algorithm BP Delta-Bar-Delta Learning Algorithm Problem: classification of breast cancer problem 9 attributes 699 examples: 458 benign and 241 malignant 16 instances with missing attribute rejected Attributes normalized with respect to their highest value Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 45

BP pattern mode results for different η
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 46

BP pattern mode results for different η

BP pattern mode results for different α

BP pattern mode results for different network structure

BP pattern mode results for different range values

BP pattern mode results for 9-2-1 net & range [-0.1,0.1]

BP pattern mode results for 9-2-1 net & range [-1,1]

BP batch mode results for different η and α

BP batch mode results for different network structure

BP batch mode results for different range values

BP batch mode results for 9-3-1 net, η=α=0.1, range[-1,1]

BP Delta-Bar-Delta results
For network, α = ξ = 0.1, Κ = β = 0.2, Training epochs =100 Range of random numbers for the values of the synaptic weights and thresholds : [-0.1,0.1] Range for the learning rate parameters ηji of the synaptic weights and the thresholds : [0, 0.2] Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 58

BP Delta-Bar-Delta results

Error-Back-Propagation Learning Algorithm
Conclusions On Error-Back-Propagation Learning Algorithm Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Summary of BP Nets Architecture: Multi-layer Feed-forward (full connection between nodes in adjacent layers, no connection within a layer) One or more hidden layers with non-linear activation function (most commonly used are sigmoid functions) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Summary of BP Nets Back-Propagation learning algorithm: Supervised learning Approach: Gradient descent to reduce the total error (why it is also called generalized delta rule) Error terms at output nodes and Error terms at hidden nodes (why it is called error BP) Ways to speed up the learning process (next slide) Adding momentum terms Adaptive learning rate (delta-bar-delta) Quickprop Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Conclusions Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Conclusions Strengths of EBP learning: Wide practical applicability Easy to implement Good generalization power Great representation power Etc. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Conclusions Problems of EBP learning: Often takes a long time to converge Gradient descent approach only guarantees a local minimum error Selection of learning parameters can only be done by trial-and-error Network paralysis may occur (learning is stopped  Saturation case) BP learning is non-incremental (to include new training samples, the network must be re-trained with all old and new samples) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

References Dilip Sarkar, “Methods to Speed Up Error Back-Propagation Learning Algorithm”, ACM Computing Surveys. Vol. 27, No. 4, December 1995 Sergios Theodoridis, Konstantinos Koutroumbas, “Pattern Recognitions, 2nd Edition”. Laurene Fausett, “Fundamentals of Neural Networks”. M. Jiang, G. Gielen, B. Zhang, Z. Luo, “Fast Learning Algorithms for Feedforward Neural Networks”, Applied Intelligence 18, 37–54, 2003. Konstantinos Adamopoulos, “Application of Back Propagation Learning Algorithms On Multi-Layer Perceptrons”, Final Year Project, Department of Computing, University of Bradford. And many more related articles. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Any Question? Thanks for your attention.

ERROR BACK-PROPAGATION LEARNING ALGORITHM

Similar presentations

Presentation on theme: "ERROR BACK-PROPAGATION LEARNING ALGORITHM"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ERROR BACK-PROPAGATION LEARNING ALGORITHM

Similar presentations

Presentation on theme: "ERROR BACK-PROPAGATION LEARNING ALGORITHM"— Presentation transcript:

Similar presentations

About project

Feedback