We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byBrandon Dobson
Modified over 4 years ago
On Simple Adaptive Momentum - 1 Presented at CIS 2008 © Dr Richard Mitchell 2008 On Simple Adaptive Momentum Dr Richard Mitchell Cybernetics Intelligence Research Group Cybernetics, School of Systems Engineering University of Reading, UK R.J.Mitchell@reading.ac.uk
On Simple Adaptive Momentum - 2 Presented at CIS 2008 © Dr Richard Mitchell 2008 Overview Simple Adaptive Momentum speeds training of (MLPs) It adapts the normal momentum term depending on the angle between the current and previous changes in the weights of the MLP. In the original paper, the weight changes of the whole network are used in determining this angle. This paper considers adapting the momentum term using certain subsets of these weights. It is inspired by the authors object oriented approach to programming MLPs, successfully used in teaching. It is concluded that the angle is best determined using the weight changes in each layer separately.
On Simple Adaptive Momentum - 3 Presented at CIS 2008 © Dr Richard Mitchell 2008 Nomenclature in Multi Layer Net x r (i) is o/p of node i in layer r; w r (i,j) is weight i of link to node j in layer r x 1 (2) x 3 (1) x 1 (1) w 3 (3,2) x 2 (2) x 2 (3) x 3 (2) x 2 (1) w 2 (0,1) w 2 (0,2) w 2 (0,3) w 3 (0,2) w 3 (0,1) w 3 (1,2) w 3 (2,2) w 3 (3,1) w 3 (2,1) w 2 (1,2) w 3 (1,1) w 2 (1,1) w 2 (2,1) w 2 (2,3) w 2 (1,3) w 2 (2,2) Inputs Outputs Change weights : Δ t w r (i,j) = η δ r (j) x r-1 (i) + α Δ t-1 w r (i,j) δ is function of error; varies with f(z); error also varies
On Simple Adaptive Momentum - 4 Presented at CIS 2008 © Dr Richard Mitchell 2008 Simple Adaptive Momentum Swanston, Bishop, & Mitchell, R.J. (1994), "Simple adaptive momentum: new algorithm for training multilayer perceptrons", Elect. Lett, Vol 30, No 18, pp1498-1500 Concept: adapt the momentum term depending on whether weight change this time in same direction as last. Direction? Weight changes in array … so are a vector Have two vectors, for current and previous, Δwc & Δwp w2w2 w1w1 Δw p2 Δw p1 Can see angle between vectors w2w2 w1w1 θ ΔwpΔwp ΔwcΔwc e.g. In 2D
On Simple Adaptive Momentum - 5 Presented at CIS 2008 © Dr Richard Mitchell 2008 Implementing SAM The simple idea is to replace momentum constant by (1+cos( )) where is angle between vector of current and previous deltaWeights, Δw c and Δw p. In original paper Δws apply to all weights in network In this paper, we consider adapting α at the network level, layer level and neuron level. Inspired by object oriented programming of MLP – provides good example and practice for students of properties of OOP albeit on old ANN.
On Simple Adaptive Momentum - 6 Presented at CIS 2008 © Dr Richard Mitchell 2008 OO Approach – Network Layers Can program MLP with objects for each neuron. But as need inputs from prev layer and deltas from next – need many pointers – problematic for students. So easier to have object for layer of neurons (all with same inputs): get inputs and weighted deltas in an array Base object is layer of linearly activated neurons LinActLayer – a single layer network of neurons f(z) = z. For Neurons with Sigmoidal Activation – only need two different functions – for calculating output and delta So have SigActLayer – an object inheriting LinActLayer uses existing members, adds 2 different ones
On Simple Adaptive Momentum - 7 Presented at CIS 2008 © Dr Richard Mitchell 2008 Network For Hidden Layers Need enhanced SigActLayer with own calculate error func: (weighted deltas in next layer). Existing objects are whole net. So have SigActHidLayer as a multiple layer network, Inherits from SigActLayer but also has a pointer to next layer. Most functions have 2 lines - process own layer and next Class Base SigActHidLayer LinActLayer SigActLayer
On Simple Adaptive Momentum - 8 Presented at CIS 2008 © Dr Richard Mitchell 2008 SAM and Hierarchy Given approach can adjust momentum using weight changes a) over the whole network b) separately by layer c) separately for each neuron For a) need to calculate the η * delta * inputs for all layers, then globally set α (1 + cosθ) For b) calculate η * delta * inputs for each layer and set the α (1 + cosθ) for each layer separately For c) do the same, but for each neuron in each layer. This works easily in the hierarchy.
On Simple Adaptive Momentum - 9 Presented at CIS 2008 © Dr Richard Mitchell 2008 Experimentation 3 problems. Have Training Validation Unseen data Stop training when error on validation set rises Run 6 times per problem with different initial weights Problem 1: 2 inputs, 10 nodes in hidden, 1 output SAM ModeNoneNeuronLayerNetwork Mean Epochs taken867227202257 SAM modeTrain SSEValid SSEUnseen SSE None0.00819850.00659650.0092535 Neuron0.01004450.00843950.0107985 Layer0.01032650.00868050.0106505 Network0.00771250.00710950.0084845
On Simple Adaptive Momentum - 10 Presented at CIS 2008 © Dr Richard Mitchell 2008 Problem 2 5 inputs, 15 nodes in hidden layer and 1 output SAM modeNoneNeuronLayerNetwork Mean Epochs1712315262312 SAM modeTrain SSEValid SSEUnseen SSE None0.00047250.00056250.0006665 Neuron0.00065850.00076350.0009525 Layer0.00076850.00087450.0011055 Network0.00062150.00076550.0009505 Trained much more quickly, but SSE worse Very little diff one layer and whole network, so..
On Simple Adaptive Momentum - 11 Presented at CIS 2008 © Dr Richard Mitchell 2008 Problem 3 5 inputs, 15 nodes in hidden layer and 3 outputs SAM ModeNoneNeuronLayerNetwork Mean Epochs1133497638977 SAM ModeTrain SSEValid SSEUnseen SSE None0.00447350.00438350.0054605 Neuron0.00482050.00456850.0057955 Layer0.00456750.00441050.0053225 Network0.00454650.00440550.0053445 SSEs averaged over 3 outputs : here Layer best
On Simple Adaptive Momentum - 12 Presented at CIS 2008 © Dr Richard Mitchell 2008 Conclusions and Further Work The Object Oriented hierarchy works neatly here SAM clearly reduces number of Epochs taken to learn – little extra overhead per epoch In one example it increased the Sum Squared Errors This needs investigating It needs to be tested on other problems, but it looks as if SAM at the layer level may be best (particularly with multiple outputs) Momentum used in other learning problems – SAM could be investigated for these.
Artificial Intelligence 12. Two Layer ANNs
Multi-Layer Perceptron (MLP)
NN – cont. Alexandra I. Cristea USI intensive course Adaptive Systems April-May 2003.
Beyond Linear Separability
Backpropagation Learning Algorithm
Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
EE 690 Design of Embodied Intelligence
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Multilayer Perceptrons 1. Overview Recap of neural network theory The multi-layered perceptron Back-propagation Introduction to training Uses.
Increasing Completion of Neural Networks Coursework- 1 Presented at CIS 2011 © Dr Richard Mitchell 2011 Increasing Completion of Neural Networks Coursework.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Artificial Neural Networks - Introduction -
Machine Learning Neural Networks
Artificial Neural Networks
Artificial Intelligence (CS 461D)
The back-propagation training algorithm
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
© 2018 SlidePlayer.com Inc. All rights reserved.