BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
Backpropagation Learning Algorithm
NEURAL NETWORKS Backpropagation Algorithm
Neural networks Introduction Fitting neural networks
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Machine Learning Neural Networks
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
The back-propagation training algorithm
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural networks.
Artificial Neural Networks
Classification Part 3: Artificial Neural Networks
Artificial Neural Networks
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Classification / Regression Neural Networks 2
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Intelligence Techniques Multilayer Perceptrons.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
BACKPROPAGATION: An Example of Supervised Learning One useful network is feed-forward network (often trained using the backpropagation algorithm) called.
Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Artificial Neural Network
Backpropagation Training
EEE502 Pattern Recognition
ERROR BACK-PROPAGATION LEARNING ALGORITHM
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Neural Networks 2nd Edition Simon Haykin
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Chapter 6 Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Machine Learning Supervised Learning Classification and Regression
The Gradient Descent Algorithm
Learning with Perceptrons and Neural Networks
第 3 章 神经网络.
CSE 473 Introduction to Artificial Intelligence Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSE P573 Applications of Artificial Intelligence Neural Networks
CSE 573 Introduction to Artificial Intelligence Neural Networks
Neural Networks Geoff Hulten.
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Neural networks (1) Traditional multi-layer perceptrons
Nonlinear Conjugate Gradient Method for Supervised Training of MLP
Presentation transcript:

BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1

Take net input to unit, pass through function: gives output of unit. Most common is Logistic function:

Transfer function Often use same function for output units (when building classifier, I.e classification) For regression problems (decimal outputs) output transfer function used is linear (just sum net inputs)

Local minima Backpropagation is a gradient descent process, each change in weights bringing net closer to a minimum error in weight space. Because of this, easily trapped in local minimum, as only 'downward' steps can be taken. Sometimes remedied by starting with different random weights (starting from different point in the error surface).

Critical parameters Cumulative versus incremental weight changing. Adjust weights after presentation of one pattern (incremental) or all (epoch)? Epoch faster, but more likely to fall into local minima and requires more memory.

Critical parameters Size of learning constant. Too high – won’t learn anything (oscillate). Too low, will take ages to find solution.

Critical parameters Momentum method. Supplement current weight adjustments with a fraction of most recent weight adjustment.

Standard back-propagation weight adjustment: i.e. weight at time t+1 equal to weight at t plus (learning rate * calculated weight change).

Back propagation with momentum included, where alpha is momentum constant: i.e. we have included some of weight change from previous cycle of back-propagation. Can significantly speed up learning.

Critical parameters Number of hidden units: Too few, won’t learn problem. Too many, network generalises poorly to new data (overfits training data).

Addition of Bias units Bias is unit which always has output of 1, sometimes helps convergence of weights to an solution by providing extra degree of freedom in weight space.

Where theta is bias weight multiplied by 1:

Bias error derivatives (analogous to weight error derivatives in normal backprop) are calculated: i.e. the Bias error derivative for an output unit is found using the delta of that output unit multiplied by the output of the bias unit it connects to (i.e. 1).

In a similar way the Bias error derivatives for the hidden units are found:

Bias weights then changed in the same way as the other weights in the network, using the Bias error derivatives and a Bias change rate parameter beta ( analogous to n the learning rate parameter for normal weights):

In the same way, the Bias weights are changed for the hidden units:

Bias units Bias units not mandatory but as they involve relatively little extra computation most neural networks have them by default.

Overtraining Overtraining a type of overfitting Can train network too much. Network becomes very good at classifying the training set, but poor at classifying the test set that it has not encountered before, i.e. it is not generalising well (overfitting training data).

Overtraining Avoid by periodically presenting a validation set and recording the error and storing weights. Best set of weights giving minimum error on the test set retrieved when training finished. Some neural network packages do this for you, in a way that is hidden from the user.

Data selection Selection of training data critical When training to perform in a noisy environment must include noisy input patterns in the training set. MLP good at interpolation but not at extrapolation. Must balance number of members of each class in training and validation sets

Catastrophic forgetting Unwise to train the network completely on patterns selected from one class and then switch to training from another class of patterns, as the network will forget the original training. Solution is to mix both classes in same training set.