ERROR BACK-PROPAGATION LEARNING ALGORITHM

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Introduction to Neural Networks Computing
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Artificial Neural Networks
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
Chapter 6: Backpropagation Nets
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Neural Nets Using Backpropagation Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill.
Simple Neural Nets For Pattern Classification
The back-propagation training algorithm
Back-Propagation Algorithm
Before we start ADALINE
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
Multiple-Layer Networks and Backpropagation Algorithms
Artificial Neural Networks
Multi-Layer Perceptrons Michael J. Watts
Chapter 9 Neural Network.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 23 Nov 2, 2005 Nanjing University of Science & Technology.
Machine Learning Chapter 4. Artificial Neural Networks
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath.
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Intelligence Techniques Multilayer Perceptrons.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
EE459 Neural Networks Backpropagation
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Chapter 2 Single Layer Feedforward Networks
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
Neural Networks 2nd Edition Simon Haykin
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Chapter 3 Supervised learning: Multilayer Networks I
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
Multiple-Layer Networks and Backpropagation Algorithms
Chapter 2 Single Layer Feedforward Networks
Real Neurons Cell structures Cell body Dendrites Axon
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
ECE 471/571 - Lecture 17 Back Propagation.
Classification Neural Networks 1
Artificial Neural Network & Backpropagation Algorithm
Artificial Neural Networks
Neural Networks Geoff Hulten.
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
Nonlinear Conjugate Gradient Method for Supervised Training of MLP
Presentation transcript:

ERROR BACK-PROPAGATION LEARNING ALGORITHM Zohreh B. Irannia

Single-Layer Perceptron xi input vector t=c(x) is the target value o is the perceptron output  learning rate (a small constant ), assume =1 wi = wi + wi wi =  (t - o) xi Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Single-Layer Perceptron Sigmoid-Function as Activation Function: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Delta Rule ? Gradient-Descent Delta-Rule Says: But WHY? Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Steepest Descent Method (w1,w2) (w1+w1,w2 +w2) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Delta Rule ? Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Delta Rule define Finally Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Delta Rule j V j f( V j ) y j: Desired Target … Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Delta Rule So we have: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Perceptron learning problem Only suitable if inputs are linearly separable Consider XOR-Problem: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Non linearly separable problems Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Solution: Multi-layer Networks New Problem: How to train different layer weights in such networks? Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Idea of Error Back Propagation Update of weights in output layer: Delta rule Delta rule is not applicable to hidden layers Because we don’t know the desired values for hidden nodes Solution: Propagating errors at output nodes back to hidden nodes Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Intuition by Illustration 3 layer / 2 inputs / 1 output: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Intuition by Illustration Each neuron  composed of 2 units Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Intuition by Illustration Training Starts through the input layer: The same happens for y2 and y3. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Intuition by Illustration Propagation of signals through the hidden layer: The same happens for y5. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Intuition by Illustration Propagation of signals through the output layer: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Intuition by Illustration Error signal of output layer neuron: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Intuition by Illustration propagate error signal back to all neurons. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Intuition by Illustration If propagated errors came from few neurons, they are added: The same happens for neuron-2 and neuron-3. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Intuition by Illustration Weight updating starts: The same happens for all neurons. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Intuition by Illustration Weight updating terminates in the last neuron: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Some Questions How often to update? After each training case? After a full sweep through the training data? How many epochs? How much to update? Use a fixed or variable learning rate? Is it true to use steepest descent method? Does it necessarily converge to global minimum? How long does it take to converge to some minimum? Etc. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Batch Mode Training Batch mode of weight updates: Weight update once per each epoch (cumulated over all P samples) Smoothing the training sample outliers Learning independent of the order of sample presentations Usually slower than in sequential mode Sometimes more likely to get stuck in local minima. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Major Problems of EBP Constant learning rate problems: Small  Slow convergence Large  Overshooting the minimum. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Steepest-Descent’s Problems Convergence to Local Minima Local Minimum Global Minimum Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Steepest-Descent’s Problems Slow Convergence (zigzag path) One solution: Steepest Descent Conjugate Gradient Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Modifications to EBP Learning Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Modifications to EBP Learning Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Speed It Up: Momentum Momentum Adds a percentage of the last movement to the current movement GD with Momentum Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Speed It Up: Momentum Weight change’s Direction: Combination of current gradient and previous gradient. Advantage: Reduce the role of outliers (Smooth search) But doesn’t adjust learning rate directly. (an indirect method) Disadvantages: May result to over-shooting. Not always reduce the number of iterations. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

But some problems remain! Remaining problem: Equal learning rates for all weights ! Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Delta-bar-Delta Allows each weight to have its own learning rate. Lets learning rates vary with time. Two heuristics are used to determine appropriate changes : If weight changes is in the same direction for several time steps , learning rate for that weight should be increased. If direction of weight change alternates , the learning rate should be decreased. Note: these heuristics won’t always improve the performance. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Delta-bar-Delta Learning rate increase linearly and decrease exponentially. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Training Samples Quality & quantity of training samples  Quality of learning results. Samples must represent well the problem space: Random sampling Proportional sampling (prior knowledge of the problem) # of training patterns needed: There is no theoretically idea number. Baum and Haussler (1989): P = W/e W: total # of weights e: acceptable classification error rate If the net can be trained to correctly classify (1 – e/2)P of the P training samples, then classification accuracy of this net is 1 – e for input patterns drawn from the same sample space Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Activation Functions Sigmoid Activation Function: Saturation regions When some incoming weights become very large input to a node may fall into a saturation region during learning. Possible remedies: Use non-saturating activation functions. Periodically normalize all weights. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Activation Functions Another sigmoid function with slower saturation rate: Change the range of the logistic function from (0,1) to (a, b) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Activation Functions Change the slope of the logistic function Larger slope: Quicker to move to saturation regions // Faster convergence Smaller slope: Slow to move to saturation regions and allows refined weight adjustment // Slow convergence Larger slope Solution Adaptive slope (each node has a learned slope) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Practical Considerations Many parameters must be carefully selected to ensure a good performance. Although the deficiencies of BP nets cannot be completely cured, some of them can be eased by some practical means. 2 important issues: Hidden Layers & Hidden Nodes Effect of Initial weights Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Hidden Layers & Hidden Nodes Theoretically, one hidden layer (possibly with many hidden nodes) is sufficient for any functions. There is no theoretical results on minimum necessary # of hidden nodes Practical rule of thumb: n = # of input nodes; m = # of hidden nodes For binary/bipolar data: m = 2n For real data: m >> 2n Multiple hidden layers with fewer nodes may be trained faster for similar quality in some applications Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Effect of Initial Weights (and Biases) Fully Random  like: [-0.05, 0.05], [-0.1, 0.1], [-1, 1] Problems: Small values  Slow learning. Large values  Go to saturation (f’(x) 0)  Slow learning. Normalize weights for hidden layer (Widrow) Random initial weights for all hidden nodes: [-0.5, 0.5] For each hidden node j, normalize its weight: m: # of input neurons n: # of hidden nodes For bias  choose a random value: Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Effect of Initial Weights (and Biases) Initialization of output weights shouldn’t result in small weighs. If small  “contribution of hidden layer neurons to the output error”, and “effect of the hidden layer weights” is not visible enough. If small, deltas (of hidden layer) become very small  small changes in the hidden layer weights. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

of different BP variants NOW A Comparison of different BP variants Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Comparison of different BP variants Four versions: BP Pattern Mode Learning Algorithm BP Batch Mode Learning Algorithm BP Delta-Bar-Delta Learning Algorithm Problem: classification of breast cancer problem 9 attributes 699 examples: 458 benign and 241 malignant 16 instances with missing attribute rejected Attributes normalized with respect to their highest value Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 45

BP pattern mode results for different η Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 46

BP pattern mode results for different η Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 47

BP pattern mode results for different α Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 48

BP pattern mode results for different α Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 49

BP pattern mode results for different network structure Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 50

BP pattern mode results for different range values Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 51

BP pattern mode results for 9-2-1 net & range [-0.1,0.1] Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 52

BP pattern mode results for 9-2-1 net & range [-1,1] Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 53

BP batch mode results for different η and α Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 54

BP batch mode results for different network structure Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 55

BP batch mode results for different range values Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 56

BP batch mode results for 9-3-1 net, η=α=0.1, range[-1,1] Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 57

BP Delta-Bar-Delta results For 9-13-1 network, α = ξ = 0.1, Κ = β = 0.2, Training epochs =100 Range of random numbers for the values of the synaptic weights and thresholds : [-0.1,0.1] Range for the learning rate parameters ηji of the synaptic weights and the thresholds : [0, 0.2] Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 58

BP Delta-Bar-Delta results Error-Back-Propagation, Baharvand, Ahmadi, Rahaie 59

Error-Back-Propagation Learning Algorithm Conclusions On Error-Back-Propagation Learning Algorithm Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Summary of BP Nets Architecture: Multi-layer Feed-forward (full connection between nodes in adjacent layers, no connection within a layer) One or more hidden layers with non-linear activation function (most commonly used are sigmoid functions) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Summary of BP Nets Back-Propagation learning algorithm: Supervised learning Approach: Gradient descent to reduce the total error (why it is also called generalized delta rule) Error terms at output nodes and Error terms at hidden nodes (why it is called error BP) Ways to speed up the learning process (next slide) Adding momentum terms Adaptive learning rate (delta-bar-delta) Quickprop Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Conclusions Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Conclusions Strengths of EBP learning: Wide practical applicability Easy to implement Good generalization power Great representation power Etc. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie Conclusions Problems of EBP learning: Often takes a long time to converge Gradient descent approach only guarantees a local minimum error Selection of learning parameters can only be done by trial-and-error Network paralysis may occur (learning is stopped  Saturation case) BP learning is non-incremental (to include new training samples, the network must be re-trained with all old and new samples) Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Error-Back-Propagation, Baharvand, Ahmadi, Rahaie References Dilip Sarkar, “Methods to Speed Up Error Back-Propagation Learning Algorithm”, ACM Computing Surveys. Vol. 27, No. 4, December 1995 Sergios Theodoridis, Konstantinos Koutroumbas, “Pattern Recognitions, 2nd Edition”. Laurene Fausett, “Fundamentals of Neural Networks”. M. Jiang, G. Gielen, B. Zhang, Z. Luo, “Fast Learning Algorithms for Feedforward Neural Networks”, Applied Intelligence 18, 37–54, 2003. Konstantinos Adamopoulos, “Application of Back Propagation Learning Algorithms On Multi-Layer Perceptrons”, Final Year Project, Department of Computing, University of Bradford. And many more related articles. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie

Any Question? Thanks for your attention. Error-Back-Propagation, Baharvand, Ahmadi, Rahaie