Presentation is loading. Please wait.

Presentation is loading. Please wait.

ERROR BACK-PROPAGATION LEARNING ALGORITHM

Similar presentations


Presentation on theme: "ERROR BACK-PROPAGATION LEARNING ALGORITHM"— Presentation transcript:

1 ERROR BACK-PROPAGATION LEARNING ALGORITHM
Zohreh B. Irannia

2 Single-Layer Perceptron
xi input vector t=c(x) is the target value o is the perceptron output  learning rate (a small constant ), assume =1 wi = wi + wi wi =  (t - o) xi Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

3 Single-Layer Perceptron
Sigmoid-Function as Activation Function: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

4 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta Rule ? Gradient-Descent Delta-Rule Says: But WHY? Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

5 Steepest Descent Method
(w1,w2) (w1+w1,w2 +w2) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

6 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta Rule ? Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

7 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta Rule define Finally Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

8 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta Rule j V j f( V j ) y j: Desired Target Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

9 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta Rule So we have: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

10 Perceptron learning problem
Only suitable if inputs are linearly separable Consider XOR-Problem: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

11 Non linearly separable problems
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

12 Solution: Multi-layer Networks
New Problem: How to train different layer weights in such networks? Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

13 Idea of Error Back Propagation
Update of weights in output layer: Delta rule Delta rule is not applicable to hidden layers Because we don’t know the desired values for hidden nodes Solution: Propagating errors at output nodes back to hidden nodes Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

14 Intuition by Illustration
3 layer / 2 inputs / 1 output: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

15 Intuition by Illustration
Each neuron  composed of 2 units Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

16 Intuition by Illustration
Training Starts through the input layer: The same happens for y2 and y3. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

17 Intuition by Illustration
Propagation of signals through the hidden layer: The same happens for y5. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

18 Intuition by Illustration
Propagation of signals through the output layer: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

19 Intuition by Illustration
Error signal of output layer neuron: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

20 Intuition by Illustration
propagate error signal back to all neurons. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

21 Intuition by Illustration
If propagated errors came from few neurons, they are added: The same happens for neuron-2 and neuron-3. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

22 Intuition by Illustration
Weight updating starts: The same happens for all neurons. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

23 Intuition by Illustration
Weight updating terminates in the last neuron: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

24 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Some Questions How often to update? After each training case? After a full sweep through the training data? How many epochs? How much to update? Use a fixed or variable learning rate? Is it true to use steepest descent method? Does it necessarily converge to global minimum? How long does it take to converge to some minimum? Etc. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

25 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Batch Mode Training Batch mode of weight updates: Weight update once per each epoch (cumulated over all P samples) Smoothing the training sample outliers Learning independent of the order of sample presentations Usually slower than in sequential mode Sometimes more likely to get stuck in local minima. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

26 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Major Problems of EBP Constant learning rate problems: Small  Slow convergence Large  Overshooting the minimum. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

27 Steepest-Descent’s Problems
Convergence to Local Minima Local Minimum Global Minimum Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

28 Steepest-Descent’s Problems
Slow Convergence (zigzag path) One solution: Steepest Descent Conjugate Gradient Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

29 Modifications to EBP Learning
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

30 Modifications to EBP Learning
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

31 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Speed It Up: Momentum Momentum Adds a percentage of the last movement to the current movement GD with Momentum Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

32 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Speed It Up: Momentum Weight change’s Direction: Combination of current gradient and previous gradient. Advantage: Reduce the role of outliers (Smooth search) But doesn’t adjust learning rate directly. (an indirect method) Disadvantages: May result to over-shooting. Not always reduce the number of iterations. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

33 But some problems remain!
Remaining problem: Equal learning rates for all weights ! Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

34 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta-bar-Delta Allows each weight to have its own learning rate. Lets learning rates vary with time. Two heuristics are used to determine appropriate changes : If weight changes is in the same direction for several time steps , learning rate for that weight should be increased. If direction of weight change alternates , the learning rate should be decreased. Note: these heuristics won’t always improve the performance. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

35 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Delta-bar-Delta Learning rate increase linearly and decrease exponentially. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

36 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Training Samples Quality & quantity of training samples  Quality of learning results. Samples must represent well the problem space: Random sampling Proportional sampling (prior knowledge of the problem) # of training patterns needed: There is no theoretically idea number. Baum and Haussler (1989): P = W/e W: total # of weights e: acceptable classification error rate If the net can be trained to correctly classify (1 – e/2)P of the P training samples, then classification accuracy of this net is 1 – e for input patterns drawn from the same sample space Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

37 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Activation Functions Sigmoid Activation Function: Saturation regions When some incoming weights become very large input to a node may fall into a saturation region during learning. Possible remedies: Use non-saturating activation functions. Periodically normalize all weights. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

38 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Activation Functions Another sigmoid function with slower saturation rate: Change the range of the logistic function from (0,1) to (a, b) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

39 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Activation Functions Change the slope of the logistic function Larger slope: Quicker to move to saturation regions // Faster convergence Smaller slope: Slow to move to saturation regions and allows refined weight adjustment // Slow convergence Larger slope Solution Adaptive slope (each node has a learned slope) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

40 Practical Considerations
Many parameters must be carefully selected to ensure a good performance. Although the deficiencies of BP nets cannot be completely cured, some of them can be eased by some practical means. 2 important issues: Hidden Layers & Hidden Nodes Effect of Initial weights Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

41 Hidden Layers & Hidden Nodes
Theoretically, one hidden layer (possibly with many hidden nodes) is sufficient for any functions. There is no theoretical results on minimum necessary # of hidden nodes Practical rule of thumb: n = # of input nodes; m = # of hidden nodes For binary/bipolar data: m = 2n For real data: m >> 2n Multiple hidden layers with fewer nodes may be trained faster for similar quality in some applications Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

42 Effect of Initial Weights (and Biases)
Fully Random  like: [-0.05, 0.05], [-0.1, 0.1], [-1, 1] Problems: Small values  Slow learning. Large values  Go to saturation (f’(x) 0)  Slow learning. Normalize weights for hidden layer (Widrow) Random initial weights for all hidden nodes: [-0.5, 0.5] For each hidden node j, normalize its weight: m: # of input neurons n: # of hidden nodes For bias  choose a random value: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

43 Effect of Initial Weights (and Biases)
Initialization of output weights shouldn’t result in small weighs. If small  “contribution of hidden layer neurons to the output error”, and “effect of the hidden layer weights” is not visible enough. If small, deltas (of hidden layer) become very small  small changes in the hidden layer weights. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

44 of different BP variants
NOW A Comparison of different BP variants Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

45 Comparison of different BP variants
Four versions: BP Pattern Mode Learning Algorithm BP Batch Mode Learning Algorithm BP Delta-Bar-Delta Learning Algorithm Problem: classification of breast cancer problem 9 attributes 699 examples: 458 benign and 241 malignant 16 instances with missing attribute rejected Attributes normalized with respect to their highest value Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 45

46 BP pattern mode results for different η
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 46

47 BP pattern mode results for different η
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 47

48 BP pattern mode results for different α
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 48

49 BP pattern mode results for different α
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 49

50 BP pattern mode results for different network structure
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 50

51 BP pattern mode results for different range values
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 51

52 BP pattern mode results for 9-2-1 net & range [-0.1,0.1]
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 52

53 BP pattern mode results for 9-2-1 net & range [-1,1]
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 53

54 BP batch mode results for different η and α
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 54

55 BP batch mode results for different network structure
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 55

56 BP batch mode results for different range values
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 56

57 BP batch mode results for 9-3-1 net, η=α=0.1, range[-1,1]
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 57

58 BP Delta-Bar-Delta results
For network, α = ξ = 0.1, Κ = β = 0.2, Training epochs =100 Range of random numbers for the values of the synaptic weights and thresholds : [-0.1,0.1] Range for the learning rate parameters ηji of the synaptic weights and the thresholds : [0, 0.2] Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 58

59 BP Delta-Bar-Delta results
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 59

60 Error-Back-Propagation Learning Algorithm
Conclusions On Error-Back-Propagation Learning Algorithm Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

61 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Summary of BP Nets Architecture: Multi-layer Feed-forward (full connection between nodes in adjacent layers, no connection within a layer) One or more hidden layers with non-linear activation function (most commonly used are sigmoid functions) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

62 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Summary of BP Nets Back-Propagation learning algorithm: Supervised learning Approach: Gradient descent to reduce the total error (why it is also called generalized delta rule) Error terms at output nodes and Error terms at hidden nodes (why it is called error BP) Ways to speed up the learning process (next slide) Adding momentum terms Adaptive learning rate (delta-bar-delta) Quickprop Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

63 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Conclusions Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

64 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Conclusions Strengths of EBP learning: Wide practical applicability Easy to implement Good generalization power Great representation power Etc. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

65 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
Conclusions Problems of EBP learning: Often takes a long time to converge Gradient descent approach only guarantees a local minimum error Selection of learning parameters can only be done by trial-and-error Network paralysis may occur (learning is stopped  Saturation case) BP learning is non-incremental (to include new training samples, the network must be re-trained with all old and new samples) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

66 Error-Back-Propagation, Baharvand, Ahmadi, Rahaie
References Dilip Sarkar, “Methods to Speed Up Error Back-Propagation Learning Algorithm”, ACM Computing Surveys. Vol. 27, No. 4, December 1995 Sergios Theodoridis, Konstantinos Koutroumbas, “Pattern Recognitions, 2nd Edition”. Laurene Fausett, “Fundamentals of Neural Networks”. M. Jiang, G. Gielen, B. Zhang, Z. Luo, “Fast Learning Algorithms for Feedforward Neural Networks”, Applied Intelligence 18, 37–54, 2003. Konstantinos Adamopoulos, “Application of Back Propagation Learning Algorithms On Multi-Layer Perceptrons”, Final Year Project, Department of Computing, University of Bradford. And many more related articles. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

67 Any Question? Thanks for your attention.
Error-Back-Propagation, Baharvand, Ahmadi, Rahaie


Download ppt "ERROR BACK-PROPAGATION LEARNING ALGORITHM"

Similar presentations


Ads by Google