We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byBrandon Dobson
Modified over 2 years ago
On Simple Adaptive Momentum - 1 Presented at CIS 2008 © Dr Richard Mitchell 2008 On Simple Adaptive Momentum Dr Richard Mitchell Cybernetics Intelligence Research Group Cybernetics, School of Systems Engineering University of Reading, UK
On Simple Adaptive Momentum - 2 Presented at CIS 2008 © Dr Richard Mitchell 2008 Overview Simple Adaptive Momentum speeds training of (MLPs) It adapts the normal momentum term depending on the angle between the current and previous changes in the weights of the MLP. In the original paper, the weight changes of the whole network are used in determining this angle. This paper considers adapting the momentum term using certain subsets of these weights. It is inspired by the authors object oriented approach to programming MLPs, successfully used in teaching. It is concluded that the angle is best determined using the weight changes in each layer separately.
On Simple Adaptive Momentum - 3 Presented at CIS 2008 © Dr Richard Mitchell 2008 Nomenclature in Multi Layer Net x r (i) is o/p of node i in layer r; w r (i,j) is weight i of link to node j in layer r x 1 (2) x 3 (1) x 1 (1) w 3 (3,2) x 2 (2) x 2 (3) x 3 (2) x 2 (1) w 2 (0,1) w 2 (0,2) w 2 (0,3) w 3 (0,2) w 3 (0,1) w 3 (1,2) w 3 (2,2) w 3 (3,1) w 3 (2,1) w 2 (1,2) w 3 (1,1) w 2 (1,1) w 2 (2,1) w 2 (2,3) w 2 (1,3) w 2 (2,2) Inputs Outputs Change weights : Δ t w r (i,j) = η δ r (j) x r-1 (i) + α Δ t-1 w r (i,j) δ is function of error; varies with f(z); error also varies
On Simple Adaptive Momentum - 4 Presented at CIS 2008 © Dr Richard Mitchell 2008 Simple Adaptive Momentum Swanston, Bishop, & Mitchell, R.J. (1994), "Simple adaptive momentum: new algorithm for training multilayer perceptrons", Elect. Lett, Vol 30, No 18, pp Concept: adapt the momentum term depending on whether weight change this time in same direction as last. Direction? Weight changes in array … so are a vector Have two vectors, for current and previous, Δwc & Δwp w2w2 w1w1 Δw p2 Δw p1 Can see angle between vectors w2w2 w1w1 θ ΔwpΔwp ΔwcΔwc e.g. In 2D
On Simple Adaptive Momentum - 5 Presented at CIS 2008 © Dr Richard Mitchell 2008 Implementing SAM The simple idea is to replace momentum constant by (1+cos( )) where is angle between vector of current and previous deltaWeights, Δw c and Δw p. In original paper Δws apply to all weights in network In this paper, we consider adapting α at the network level, layer level and neuron level. Inspired by object oriented programming of MLP – provides good example and practice for students of properties of OOP albeit on old ANN.
On Simple Adaptive Momentum - 6 Presented at CIS 2008 © Dr Richard Mitchell 2008 OO Approach – Network Layers Can program MLP with objects for each neuron. But as need inputs from prev layer and deltas from next – need many pointers – problematic for students. So easier to have object for layer of neurons (all with same inputs): get inputs and weighted deltas in an array Base object is layer of linearly activated neurons LinActLayer – a single layer network of neurons f(z) = z. For Neurons with Sigmoidal Activation – only need two different functions – for calculating output and delta So have SigActLayer – an object inheriting LinActLayer uses existing members, adds 2 different ones
On Simple Adaptive Momentum - 7 Presented at CIS 2008 © Dr Richard Mitchell 2008 Network For Hidden Layers Need enhanced SigActLayer with own calculate error func: (weighted deltas in next layer). Existing objects are whole net. So have SigActHidLayer as a multiple layer network, Inherits from SigActLayer but also has a pointer to next layer. Most functions have 2 lines - process own layer and next Class Base SigActHidLayer LinActLayer SigActLayer
On Simple Adaptive Momentum - 8 Presented at CIS 2008 © Dr Richard Mitchell 2008 SAM and Hierarchy Given approach can adjust momentum using weight changes a) over the whole network b) separately by layer c) separately for each neuron For a) need to calculate the η * delta * inputs for all layers, then globally set α (1 + cosθ) For b) calculate η * delta * inputs for each layer and set the α (1 + cosθ) for each layer separately For c) do the same, but for each neuron in each layer. This works easily in the hierarchy.
On Simple Adaptive Momentum - 9 Presented at CIS 2008 © Dr Richard Mitchell 2008 Experimentation 3 problems. Have Training Validation Unseen data Stop training when error on validation set rises Run 6 times per problem with different initial weights Problem 1: 2 inputs, 10 nodes in hidden, 1 output SAM ModeNoneNeuronLayerNetwork Mean Epochs taken SAM modeTrain SSEValid SSEUnseen SSE None Neuron Layer Network
On Simple Adaptive Momentum - 10 Presented at CIS 2008 © Dr Richard Mitchell 2008 Problem 2 5 inputs, 15 nodes in hidden layer and 1 output SAM modeNoneNeuronLayerNetwork Mean Epochs SAM modeTrain SSEValid SSEUnseen SSE None Neuron Layer Network Trained much more quickly, but SSE worse Very little diff one layer and whole network, so..
On Simple Adaptive Momentum - 11 Presented at CIS 2008 © Dr Richard Mitchell 2008 Problem 3 5 inputs, 15 nodes in hidden layer and 3 outputs SAM ModeNoneNeuronLayerNetwork Mean Epochs SAM ModeTrain SSEValid SSEUnseen SSE None Neuron Layer Network SSEs averaged over 3 outputs : here Layer best
On Simple Adaptive Momentum - 12 Presented at CIS 2008 © Dr Richard Mitchell 2008 Conclusions and Further Work The Object Oriented hierarchy works neatly here SAM clearly reduces number of Epochs taken to learn – little extra overhead per epoch In one example it increased the Sum Squared Errors This needs investigating It needs to be tested on other problems, but it looks as if SAM at the layer level may be best (particularly with multiple outputs) Momentum used in other learning problems – SAM could be investigated for these.
Artificial Intelligence 12. Two Layer ANNs Course V231 Department of Computing Imperial College, London © Simon Colton.
Beyond Linear Separability. Limitations of Perceptron Only linear separations Only converges for linearly separable data One Solution (SVM’s) Map data.
Backpropagation Learning Algorithm. The backpropagation algorithm was used to train the multi layer perception MLP MLP used to describe any general Feedforward.
Multi-Layer Perceptron (MLP). Today we will introduce the MLP and the backpropagation algorithm which is used to train it MLP used to describe any general.
EA C461 - Artificial Intelligence Neural Networks.
ICT2191 Topic 9 A Neural Network Animat Aiming Problem Perceptron to Learn the Aiming Problem The Correction Problem Perceptron to Learn the Correction.
© Negnevitsky, Pearson Education, Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works Introduction, or.
FUNCTION FITTING Students name: Ruba Eyal Salman Supervisor: Dr. Ahmad Eljaafreh.
Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.
Eddy Li Eric Wong Martin Ho Kitty Wong Introduction to.
Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.
Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,
Neural Networks and Kernel Methods. How are we doing on the pass sequence? We can now track both men, provided with –Hand-labeled coordinates of both.
CS 678 – Deep Learning1 Deep Learning Early Work Why Deep Learning Stacked Auto Encoders Deep Belief Networks.
Artificial Neural Network (ANN) loosely based on biological neuron Each unit is simple, but many connected in a complex network If enough inputs are received.
6. Radial-basis function (RBF) networks RBF = radial-basis function: a function which depends only on the radial distance from a point XOR problem quadratically.
Committee Machines and Mixtures of Experts Neural Networks 12.
1 CSI 5388:Topics in Machine Learning Inductive Learning: A Review.
Artificial Intelligence 4. Knowledge Representation Course V231 Department of Computing Imperial College, London Jeremy Gow.
Using Trees to Depict a Forest Bin Liu, H.V. Jagadish Department of EECS University of Michigan Ann Arbor, USA Proceedings of Very Large Data Base Endowment.
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
7. Support Vector Machines (SVMs) Basic Idea: 1.Transform the data with a non-linear mapping so that it is linearly separable. Cf Cover’s theorem: non-linearly.
Version 1.0 – 19 Jan 2009 Functional Genomics and Microarray Analysis (2)
1 Computational Complexity Size Matters!. 2 Suppose there are several algorithms which can all be used to perform the same task. We need some way to judge.
EE-M /6: IS L3&4 1/32 v2.0 Lectures 3&4: Linear Machine Learning Algorithms Dr Martin Brown Room: E1k Telephone:
AI - NN Lecture Notes Chapter 8 Feed-forward Networks.
Does Using New Technology Improve Children's Learning? Andy Powell and Jess Allen.
13.1 Vis_2003 Data Visualization Lecture 13 Visualization of Very Large Datasets.
1 Object Systems Methods for attaching data to objects, and connecting behaviors Doug Church.
Learning Rules 1 Computational Neuroscience 03 Lecture 8.
© 2016 SlidePlayer.com Inc. All rights reserved.