NEURAL NETWORKS Backpropagation Algorithm

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Slides from: Doug Gray, David Poole
Neural networks Introduction Fitting neural networks
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
The back-propagation training algorithm
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Multi Layer Perceptrons (MLP) Course website: The back-propagation algorithm Following Hertz chapter 6.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2006 Shreekanth Mandayam ECE Department Rowan University.
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural networks.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Artificial Neural Networks
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath.
Appendix B: An Example of Back-propagation algorithm
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
Akram Bitar and Larry Manevitz Department of Computer Science
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
EEE502 Pattern Recognition
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks 2nd Edition Simon Haykin
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Lecture 2 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 2/1 Dr.-Ing. Erwin Sitompul President University
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
The Gradient Descent Algorithm
Learning with Perceptrons and Neural Networks
Neural Networks A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Prof. Carolina Ruiz Department of Computer Science
CSC 578 Neural Networks and Deep Learning
Biological and Artificial Neuron
Biological and Artificial Neuron
Biological and Artificial Neuron
Artificial Neural Networks
Neural Networks Geoff Hulten.
Neural networks (1) Traditional multi-layer perceptrons
Prof. Carolina Ruiz Department of Computer Science
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

NEURAL NETWORKS Backpropagation Algorithm Dear students, todays topic is the Backpropagation Algorithm .The steps and critical point to be considered are summarized in this lesson. PROF. DR. YUSUF OYSAL

Backpropagation Algorithm NEURAL NETWORKS – Backpropagation Algorithm Backpropagation Algorithm Backpropagation Algorithm has two phases: Forward pass phase: computes ‘functional signal’, feed forward propagation of input pattern signals through network Backward pass phase: computes ‘error signal’, propagates the error backwards through network starting at output units (where the error is the difference between actual and desired output values) Backpropagation Algorithm has two phases: Forward pass phase: computes ‘functional signal’, feed forward propagation of input pattern signals through network Backward pass phase: computes ‘error signal’, propagates the error backwards through network starting at output units (where the error is the difference between actual and desired output values)

Backpropagation Algorithm NEURAL NETWORKS – Backpropagation Algorithm Backpropagation Algorithm Step 1: Determine the architecture how many input and output neurons; what output encoding ? hidden neurons and layers. Step 2: Initialize all weights and biases to small random values, typically ∈ [-1,1], choose a learning rate η. Step 3: Repeat until termination criteria satisfied Present a training example and propagate it through the network (forward pass) Calculate the actual output Inputs applied Multiplied by weights Summed ‘Squashed’ by sigmoid activation function Output passed to each neuron in next layer Adapt weights starting from the output layer and working backwards (backward pass) Here are the steps of the algorithm: Step 1: Determine the architecture how many input and output neurons; what output encoding hidden neurons and layers Step 2: Initialize all weights and biases to small random values, typically select between minus 1 and 1, choose a learning rate nu. Step 3: Repeat until termination criteria satisfied Present a training example and propagate it through the network (forward pass) Calculate the actual output: Inputs applied,and then multiplied by weights and summed, then squashed by sigmoid activation function. Finally output passed to each neuron in next layer Adapt weights starting from the output layer and working backwards (backward pass)

Parameter Update Rules NEURAL NETWORKS – Backpropagation Algorithm Parameter Update Rules wpq (t) - weight from node p to node q at time t Weight change Error propagation for output neuron i Error propagation for hidden neuron j (the sum is over the i nodes in the layer above the node j) Here are the parameter update rules of the backpropagation algorithm. W p q t is the weight from node p to node q at time t. It is updated by the steepest descent formula. The weight change is a function of the error propagation delta. Previous week, error propagation is shown by the epsilon symbol, but most of the books in the literatüre uses delta symbol. Error propagation formulas for output neuron i and for hidden neuron j are summarized here. In back propagation algorithm the stopping criteria is checked at the end of each epoch. The error (mean absolute or mean square) at the end of an epoch is below a threshold. All training examples are propagated and the mean (absolute or square) error is calculated. The threshold is determined heuristicly. But if not maximum number of epochs is reached. It typically takes hundreds or thousands of epochs for an NN to converge.

Backpropagation - Example NEURAL NETWORKS – Backpropagation Algorithm Backpropagation - Example Training set p1 = [0.6 0.1]T class 1 banana p2 = [0.2 0.3]T class 2 orange Network architecture How many inputs? How many hidden neurons? Heuristic: n=(inputs+output_neurons)/2 How many output neurons? What encoding of the outputs? 10 for class 1, 01 for class 2 • Initial weights and learning rate Let’s η = 0.1 and the weights are set as in the figure. In this example training set contains two examples: one for class 1 banana and one for class 2 orange. First step of the algorithm is illustrated in this slide. Network architecture is determined by heuristicly. The classes in other words outputs are encoded by 10 for class 1, 01 for class 2. Initial weights are set as in the figure. The learning rate is selected as 0 point 1.

Backpropagation - Example NEURAL NETWORKS – Backpropagation Algorithm Backpropagation - Example 1. Forward pass for example 1 calculate the outputs o6 and o7 o1=0.6, o2=0.1, target output 1 0, i.e. class 1 Activations of the hidden units: net3= o1*w13+ o2*w23+b3=0.6*0.1+0.1*(-0.2)+0.1=0.14 o3=1/(1+e-net3) =0.53 net4= o1 *w14+ o2*w24+b4=0.6*0+0.1*0.2+0.2=0.22 o4=1/(1+e-net4) =0.55 net5= o1 *w15+ o2*w25+b5=0.6*0.3+0.1*(-0.4)+0.5=0.64 o5=1/(1+e-net5) =0.65 Activations of the output units: net6= o3 *w36+ o4*w46+ o5*w56 +b6=0.53*(-0.4)+0.55*0.1+0.65*0.6-0.1=0.13 o6=1/(1+e-net6) =0.53 net7= o3 *w37+ o4*w47+ o5*w57 +b7=0.53*0.2+0.55*(-0.1)+0.65*(-0.2)+0.6=0.52 o7=1/(1+e-net7) =0.63 Here are the forward pass calculations of the first example in the training set. The neural network outputs are calculated here finally.

Backpropagation - Example NEURAL NETWORKS – Backpropagation Algorithm Backpropagation - Example 2. Backward pass for example 1 Calculate the output errors δ6 and δ7 (note that d6=1, d7=0 for class 1) δ6 = (d6-o6) * o6 * (1-o6)=(1-0.53)*0.53*(1-0.53)=0.12 δ7 = (d7-o7) * o7 * (1-o7)=(0-0.63)*0.63*(1-0.63)=-0.15 Calculate the new weights between the hidden and output units (η=0.1) Δw36= η * δ6 * o3 = 0.1*0.12*0.53=0.006 w36new = w36old + Δw36 = -0.4+0.006=-0.394 Δw37= η * δ7 * o3 = 0.1*-0.15*0.53=-0.008 w37new = w37old + Δw37 = 0.2-0.008=-0.19 Similarly for w46new, w47new, w56new and w57new For the biases b6 and b7 (remember: biases are weights with input 1): Δb6= η * δ6 * 1 = 0.1*0.12=0.012 b6new = b6old + Δb6 = -0.1+0.012=-0.012 Similarly for b7 Here are the backward pass calculations of the first example in the training set. The neural network weights are updated here finally.

Backpropagation - Example NEURAL NETWORKS – Backpropagation Algorithm Backpropagation - Example Calculate the errors of the hidden units δ3, δ4 and δ5 δ3 = o3 * (1-o3) * (w36* δ6 +w37 * δ7 ) = 0.53*(1-0.53)(-0.4*0.12+0.2*(-0.15))=-0.019 Similarly for δ4 and δ5 Calculate the new weights between the input and hidden units (η=0.1) Δw13= η * δ3 * o1 = 0.1*(-0.019)*0.6=-0.0011 w13new = w13old + Δw13 = 0.1-0.0011=0.0989 Similarly for w23new, w14new, w24new, w15new and w25new; b3new, b4new and b6new 3. Repeat the same procedure for the other training examples • Forward pass for example 2, backward pass for example 2… • Note: it’s better to apply input examples in random order 4. At the end of the epoch – check if the stopping criteria is satisfied: if yes: stop training if not, continue training: epoch++ go to step 1 Repeat the same procedure for the other training examples. Forward pass for example 2, backward pass for example 2… Note that it’s better to apply input examples in random order. At the end of the epoch – check if the stopping criteria is satisfied: if yes: stop training if not, continue training: increase epoch number by one and go to step 1.

Backpropagation Algorithm NEURAL NETWORKS – Backpropagation Algorithm Backpropagation Algorithm Not optimal - is guaranteed to find a minimum but it might be a local minimum! Backpropagation’s error space: many local and 1 global minimum => the generalized gradient descent may not find the global minimum If the algorithm converges to a local minimum,the trajectory is trapped in a valley and diverges from the optimal solution, try different initializations. If the algorithm is slow to converge as there are flat surfaces over the path, increase the learning rate or smooth out the trajectory by averaging the updates to the parameters Backpropagation algorithm may not converge to an optimal solution. It is guaranteed to find a minimum but it might be a local minimum! Backpropagation’s error space has many local and 1 global minimum points, so the generalized gradient descent may not find the global minimum If the algorithm converges to a local minimum, thus the trajectory is trapped in a valley and diverges from the optimal solution, then try different initializations. If the algorithm is slow to converge as there are flat surfaces over the path, then increase the learning rate or smooth out the trajectory by averaging the updates to the parameters

NEURAL NETWORKS – Backpropagation Algorithm Overtraining Problem Based on them the network should be able to generalize what it has learned to the total population of examples. Overtraining (overfitting): the error on the training set is very small but when a new data is presented to the network, the error is high. => the network has memorized the training examples but has not learned to generalize to new situations! Reasons of overtraining: Training examples are noisy Number of the free parameters is bigger than the number of training examples Preventing Overtraining Use network that is just large enough to provide an adequate fit The network should not have more free parameters than there are training examples! Use an early stopping method K-fold cross validation may be used Based on them the network should be able to generalize what it has learned to the total population of examples. Overtraining (overfitting) is a problem of training neuarl networks. The error on the training set is very small but when a new data is presented to the network, the error is high. Thus the network has memorized the training examples but has not learned to generalize to new situations! Reasons of overtraining are training examples are noisy or mumber of the free parameters is bigger than the number of training examples. To prevent overtraining, the followings can be applied depending on the situaiton: Use network that is just large enough to provide an adequate fit. The network should not have more free parameters than there are training examples! Use an early stopping method. K-fold cross validation may be used.