Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Perceptron Lecture 4.
Beyond Linear Separability
Backpropagation Learning Algorithm
NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Neural Networks (1)
NEURAL NETWORKS Perceptron
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
The back-propagation training algorithm
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
An Illustrative Example
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Image Compression Using Neural Networks Vishal Agrawal (Y6541) Nandan Dubey (Y6279)
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Radial Basis Function (RBF) Networks
Neural networks.
Artificial neural networks:
Where We’re At Three learning rules  Hebbian learning regression  LMS (delta rule) regression  Perceptron classification.
Artificial Neural Networks
Multiple-Layer Networks and Backpropagation Algorithms
Artificial Neural Networks
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Multi-Layer Perceptrons Michael J. Watts
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath.
Appendix B: An Example of Back-propagation algorithm
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
Radial Basis Function Networks:
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
BACKPROPAGATION: An Example of Supervised Learning One useful network is feed-forward network (often trained using the backpropagation algorithm) called.
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
Chapter 2 Single Layer Feedforward Networks
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
Artificial Intelligence Techniques. Aims: Section fundamental theory and practical applications of artificial neural networks.
Each neuron has a threshold value Each neuron has weighted inputs from other neurons The input signals form a weighted sum If the activation level exceeds.
NEURAL NETWORKS LECTURE 1 dr Zoran Ševarac FON, 2015.
Neural Networks 2nd Edition Simon Haykin
Artificial Neural Networks (ANN). Artificial Neural Networks First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes.
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Multiple-Layer Networks and Backpropagation Algorithms
Artificial neural networks:
CSE 473 Introduction to Artificial Intelligence Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CS621: Artificial Intelligence
Chapter 3. Artificial Neural Networks - Introduction -
Artificial Intelligence Methods
Artificial Neural Network & Backpropagation Algorithm
Neural Networks Chapter 5
Artificial Neural Networks
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Multilayer Perceptrons 1

Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses

Recap

Linear separability  When a neuron learns it is positioning a line so that all points on or above the line give an output of 1 and all points below the line give an output of 0  When there are more than 2 inputs, the pattern space is multi-dimensional, and is divided by a multi-dimensional surface (or hyperplane) rather than a line

Pattern space - linearly separable X2 X1

Non-linearly separable problems  If a problem is not linearly separable, then it is impossible to divide the pattern space into two regions  A network of neurons is needed

Pattern space - non linearly separable X2 X1 Decision surface

The multi-layered perceptron (MLP)

Input layer Hidden layerOutput layer

Complex decision surface  The MLP has the ability to emulate any function using one hidden layer with a sigmoid function, and a linear output layer  A 3-layered network can therefore produce any complex decision surface  However, the number of neurons in the hidden layer cannot be calculated

Network architecture  All neurons in one layer are connected to all neurons in the next layer  The network is a feedforward network, so all data flows from the input to the output  The architecture of the network shown is described as 3:4:2  All neurons in the hidden and output layers have a bias connection

Input layer  Receives all of the inputs  Number of neurons equals the number of inputs  Does no processing  Connects to all the neurons in the hidden layer

Hidden layer  Could be more than one layer, but theory says that only one layer is necessary  The number of neurons is found by experiment  Processes the inputs  Connects to all neurons in the output layer  The output is a sigmoid function

Output layer  Produces the final outputs  Processes the outputs from the hidden layer  The number of neurons equals the number of outputs  The output could be linear or sigmoid

Problems with networks  Originally the neurons had a hard-limiter on the output  Although an error could be found between the desired output and the actual output, which could be used to adjust the weights in the output layer, there was no way of knowing how to adjust the weights in the hidden layer

The invention of back- propagation  By introducing a smoothly changing output function, it was possible to calculate an error that could be used to adjust the weights in the hidden layer(s)

Output function The sigmoid function net y

Sigmoid function  The sigmoid function goes smoothly from 0 to 1 as net increases  The value of y when net=0 is 0.5  When net is negative, y is between 0 and 0.5  When net is positive, y is between 0.5 and 1.0

Back-propagation  The method of training is called the back- propagation of errors  The algorithm is an extension of the delta rule, called the generalised delta rule

Generalised delta rule  The equation for the generalised delta rule is ΔWi = ηXiδ  δ is the defined according to which layer is being considered.  For the output layer, δ is y(1-y)(d-y).  For the hidden layer δ is a more complex.

Training a network  Example: The problem could not be implemented on a single layer - nonlinearly separable  A 3 layer MLP was tried with 2 neurons in the hidden layer - which trained  With 1 neuron in the hidden layer it failed to train

The hidden neurons

The weights  The weights for the 2 neurons in the hidden layer are -9, 3.6 and 0.1 and 6.1, 2.2 and -7.8  These weights can be shown in the pattern space as two lines  The lines divide the space into 4 regions

Training and Testing

 Starting with a data set, the first step is to divide the data into a training set and a test set  Use the training set to adjust the weights until the error is acceptably low  Test the network using the test set, and see how many it gets right

A better approach  Critics of this standard approach have pointed out that training to a low error can sometimes cause “overfitting”, where the network performs well on the training data but poorly on the test data  The alternative is to divide the data into three sets, the extra one being the validation set

Validation set  During training, the training data is used to adjust the weights  At each iteration, the validation/test data is also passed through the network and the error recorded but the weights are not adjusted  The training stops when the error for the validation/test set starts to increase

Stopping criteria error time Stop here Validation set Training set

The multi-layered perceptron (MLP) and Backpropogation

Architecture Input layer Hidden layerOutput layer

Back-propagation  The method of training is called the back- propagation of errors  The algorithm is an extension of the delta rule, called the generalised delta rule

Generalised delta rule  The equation for the generalised delta rule is ΔWi = ηXiδ  δ is the defined according to which layer is being considered.  For the output layer, δ is y(1-y)(d-y).  For the hidden layer δ is a more complex.

Hidden Layer  We have to deal with the error from the output layer being feedback backwards to the hidden layer.  Lets look at example the weight w2(1,2)  Which is the weight connecting neuron 1 in the input layer with neuron 2 in the hidden layer.

 Δw2(1,2)=ηX1(1)δ2(2)  Where  X1(1) is the output of the neuron 1 in the hidden layer.  δ2(2) is the error on the output of neuron 2 in the hidden layer.  δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)

 δ3(1)= y(1-y)(d-y) =x3(1)[1-x3(1)][d-x3(1)]  So we start with the error at the output and use this result to ripple backwards altering the weights.

Example  Exclusive OR using the network shown earlier: 2:2:1 network  Initial weights  W2(0,1)= , W2(1,1)= , W2(2,1)=  W2(0,2)= , w2(1,2)= , w2(2,2)=  W3(0,1)= , w3(1,1)= , w3(2,1)=

Feedforward – hidden layer (neuron 1)  So if  X1(0)=1 (the bias)  X1(1)=0  X1(2)=0  The output of weighted sum inside neuron 1 in the hidden layer=  Then using sigmoid function  X2(1)=

Feedforward – hidden layer (neuron 2)  So if  X1(0)=1 (the bias)  X1(1)=0  X1(2)=0  The output of weighted sum inside neuron 2 in the hidden layer=  Then using sigmoid function  X2(2)=

Feedforward – output layer  So if  X2(0)=1 (the bias)  X2(1)=  X2(2)=  The output of weighted sum inside neuron 2 in the hidden layer=  Then using sigmoid function  X3(1)=  Desired output=0

 δ3(1)=x3(1)[1-x3(1)][d-x3(1)] =  δ2(1)=X2(1)[1-X2(1)]w3(1,1) δ3(1)=  δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)=  Now we can use the delta rule to calculate the change in the weights  ΔWi = ηXiδ

Examples  If we set η=0.5  ΔW2(0,1) = ηX1(0)δ2(1) =0.5 x 1 x =  ΔW3(2,1) = ηX2(1)δ3(1) =0.5 x x – =

 What would be the results of the following?  ΔW2(2,1) = ηX1(2)δ2(1)  ΔW2(2,2) = ηX1(2)δ2(2)

 ΔW2(2,1) = ηX1(2)δ2(1) =0.5x0x =0  ΔW2(2,2) = ηX1(2)δ2(2) =0.5 x 0 x – =0

 New weights  W2(0,1)= W2(1,1)= W2(2,1)=  W2(0,2)= w2(1,2)= w2(2,2)=  W3(0,1)= w3(1,1)= w3(2,1)=