Improved BP algorithms ( first order gradient method) 1.BP with momentum 2.Delta- bar- delta 3.Decoupled momentum 4.RProp 5.Adaptive BP 6.Trinary BP 7.BP.

Slides:



Advertisements
Similar presentations
NEURAL NETWORKS Backpropagation Algorithm
Advertisements

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Computing Gradient Vector and Jacobian Matrix in Arbitrarily Connected Neural Networks Author : Bogdan M. Wilamowski, Fellow, IEEE, Nicholas J. Cotton,
B.Macukow 1 Lecture 12 Neural Networks. B.Macukow 2 Neural Networks for Matrix Algebra Problems.
Performance Optimization
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
The back-propagation training algorithm
Methods For Nonlinear Least-Square Problems
2806 Neural Computation Multilayer neural networks Lecture Ari Visa.
12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Advanced Topics in Optimization
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Why Function Optimization ?
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
Implementation of Nonlinear Conjugate Gradient Method for MLP Matt Peterson ECE 539 December 10, 2001.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
By Mary Hudachek-Buswell. Overview Atmospheric Turbulence Blur.
UNCONSTRAINED MULTIVARIABLE
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 9. Optimization problems.
Collaborative Filtering Matrix Factorization Approach
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Biointelligence Laboratory, Seoul National University
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
ADALINE (ADAptive LInear NEuron) Network and
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
Backpropagation Training
Variations on Backpropagation.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
METHOD OF STEEPEST DESCENT ELE Adaptive Signal Processing1 Week 5.
Neural Networks 2nd Edition Simon Haykin
Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.
CSC321: Neural Networks Lecture 9: Speeding up the Learning
Machine Learning Supervised Learning Classification and Regression
Fall 2004 Backpropagation CS478 - Machine Learning.
Extreme Learning Machine
One-layer neural networks Approximation problems
第 3 章 神经网络.
CSE 473 Introduction to Artificial Intelligence Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial neural networks – Lecture 4
ECE 471/571 - Lecture 17 Back Propagation.
CS5321 Numerical Optimization
Collaborative Filtering Matrix Factorization Approach
Variations on Backpropagation.
Outline Single neuron case: Nonlinear error correcting learning
Synaptic DynamicsII : Supervised Learning
By Viput Subharngkasen
Ch2: Adaline and Madaline
METHOD OF STEEPEST DESCENT
Capabilities of Threshold Neurons
6.6 The Marquardt Algorithm
Backpropagation.
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Variations on Backpropagation.
Computer Vision Lecture 19: Object Recognition III
Neural Network Training
Backpropagation.
Performance Optimization
Section 3: Second Order Methods
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Improved BP algorithms ( first order gradient method) 1.BP with momentum 2.Delta- bar- delta 3.Decoupled momentum 4.RProp 5.Adaptive BP 6.Trinary BP 7.BP with adaptive gain 8.Extended BP

BP with momentum (BPM) The basic improvement to BP (Rumelhart 1986) Momentum factor alpha selected between zero and one Adding momentum improves the convergence speed and helps network from being trapped in a local minimum.

Modification form: Proposed by nagata 1990 Beta is a constant value decided by user Nagata claimed that beta term reduce the possibility of the network being trapped in the local minimum This seems beta repeating alpha rule again!!! But not clear

Delta-bar-delta (DBD) Use adaptive learning rate to speed up the convergence The adaptive rate adopted base on local optimization Use gradient descent for the search direction, and use individual step sizes for each weight

RProp Jervis and Fitzgerald (1993) and limit the size of the step

Second order gradient methods Newton Gauss-Newton Levenberg-Marquardt Quickprop Conjugate gradient descent Broyde –Fletcher-Goldfab-Shanno

Performance Surfaces Taylor Series Expansion

Example

Plot of Approximations

Vector Case

Matrix Form

Performance Optimization Basic Optimization Algorithm Steepest Descent

Examples

Minimizing Along a Line

Example

Plot

Newton’s Method

Example

Plot

Non-Quadratic Example

Different Initial Conditions

DIFFICULT – Inverse a singular matrix!!! – Complexity

Newton’s Method

Matrix Form

Hessian

Gauss-Newton Method

Levenberg-Marquardt

Adjustment of  k

Application to Multilayer Network

Jacobian Matrix

Computing the Jacobian

Marquardt Sensitivity

Computing the Sensitivities

LMBP Present all inputs to the network and compute the corresponding network outputs and the errors. Compute the sum of squared errors over all inputs. Compute the Jacobian matrix. Calculate the sensitivities with the backpropagation algorithm, after initializing. Augment the individual matrices into the Marquardt sensitivities. Compute the elements of the Jacobian matrix. Solve to obtain the change in the weights. Recompute the sum of squared errors with the new weights. If this new sum of squares is smaller than that computed in step 1, then divide  k by , update the weights and go back to step 1. If the sum of squares is not reduced, then multiply  k by  and go back to step 3.

Example LMBP Step

LMBP Trajectory

Conjugate Gradient

LRLS method Recursive least square method based method, does not need gradient descent.

LRLS method cont.

Example for compare methods AlgorithmsAverage time steps (cut off at 0.15) Average time steps (cut off at 0.1) Average time steps (cut off at 0.05) BPM DBD LM LRLS3969

IGLS (Integration of the Gradient and Least Square) The LRLS algorithm has faster convergence speed than BPM algorithm. However, it is computationally more complex and is not ideal for very large network size. The learning strategy describe here combines ideas from both the algorithm to achieve the objectives of maintaining complexity for large network size and reasonably fast convergence. The out put layer weights update using BPM method The hidden layer weights update using BPM method