Presentation on theme: "EE 690 Design of Embodied Intelligence"— Presentation transcript:
1EE 690 Design of Embodied Intelligence Least-squares-based Multilayer perceptron training with weighted adaptation Software simulation projectEE 690Design of Embodied Intelligence
2Outline Multilayer Perceptron Least-squares based Learning Algorithm Weighted Adaptation in trainingSignal-to-Noise Ratio Figure and OverfittingSoftware simulation project
3Multilayer perceptron (MLP) Feedforward (no recurrent connections) network with units arranged in layersInputs xOutputs zA typical 2-layer multi-layer perceptron contains input layer, hidden layer and output layer.The inputs go into the perceptron, the data go through the connections between input layer and the hidden layer.The weighted summation goes through the neurons on the hidden layer.The hidden neurons implements usually nonlinear transfer functions to bring the nonlinearity of the networks.The data from the hidden neurons go through the connections between the hidden layer and the output layer.The weighted summation of the data from hidden neurons will be the output of the network.The weights are trainable to implement functions.
4Multilayer perceptron (MLP) Efficient mapping from inputs to outputsPowerful universal function approximationNumber of inputs and outputs determined by the dataNumber of hidden neuronsNumber of hidden layersMLPoutputsinputsThe weights are trainable to map the relations between input and output data.The MLP can be a powerful tool in function approximation in the sense that it finds the weights to combine nonlinear basis function.In function approximation, the number of basis function to use affects the learning ability of the function approximator.The number of the input and output neurons are determined by the dimensions of the data, so that only the number of hidden neurons is left to be decided by users.And the number of hidden neurons, according to number of basis functions to use in approximation, affects the fitting accuracy. In such sense, it is a critical parameter for NNs.
5Multilayer Perceptron Learning Back-propagation (BP) training algorithm: how much each weight is responsible for the error signalBP has two phases:Forward pass phase: feedforward propagation of input signals through networkBackward pass phase: propagates the error backwards through networkhidden layerinput layeroutput layer
6Multilayer Perceptron Learning Backward PassWe want to know how to modify weights in order to decrease E.Use gradient descent:Gradient-based adjustment could go to local minimaTime-consuming due to large number of learning steps and the step size needs to be configured
7Least-squares based Learning Algorithm Least-squared fit (LSF): to obtain the minimum sum of squared errorFor underdetermined problem, LSF finds the solution with the minimum SSEFor overdetermined problem, pseudo-inverse finds the solution with minimum normCan be applied in the optimization for weights or signals on the layersOptimized weightsOptimized signals
8Least-squares based Learning Algorithm (I) Start with desired output signal back-propagation signals optimizationPropagation of the desired outputs back through layersOptimization of the weights between layers(1). y2=f -1(z2), scale y1 to (-1, 1).(2). Based on W2, b2: W2.z1=y2-b2.(3). y1=f-1(z1), scale y1 to (-1, 1).(4). Optimize W1, b1 to satisfy W1.x-b1=y1.(5). Evaluate z1, y1 using the new W1 and bias b1.(6). Optimize W2, b2 to satisfy W2.z1+b2=y2.(7). Evaluate z2, y2 using the new W2 and bias b2.(8). Evaluate the MSEz2y2dW1y1z1b1W2xb2
9Least-squares based Learning Algorithm (I) Weights optimization with weighted LSFThe location of x on the transfer function determines its effect on output signal of this layerdy/dx weighting term in LSFΔyOptimize W1, b1 to satisfy W1.x=y1-b1ΔyΔxΔxWeighted LSF
10Least-squares based Learning Algorithm (II) II. Weights optimization with iterative fittingW1 can be further adjusted based on the output errorxEach hidden neuron: basis functionStart with the 1st hidden neurons, and continue to other neuronsas long as eout exists
11Least-squares based Learning Algorithm (III) III. Start with input feedforward weights optimizationPropagation of the inputs forward through layersOptimization of the weights between layers and signals on layers(1). Evaluate z1, y1 using the initial W1 and bias b1.(2). y2=f -1(d).(3). Optimize W2, b2 to satisfy W2.z1+b2=y2.(4). Based on W2, b2, optimize z1 to satisfyW2.z1-b2=y2.(5). y1=f-1(z1).(6). Optimize W1, b1 to satisfy W1.x+b1=y1.(7). Evaluate y1, z1, y2, z2 using the new W1,W2and bias b1,b2.(8). Evaluate the MSEz2y2dW1y1z1b1W2xb2
12Least-squares based Learning Algorithm (III) Signal optimization with weighted adaptationThe location of x on the transfer function determines how much the signal can be changedyx
13Overfitting problemLearning algorithm can adapt MLP to fit into the training data.For the noisy training data, how well we should learn into the data?OverfittingNumber of hidden neuronsNumber of layers affect the training accuracy, determined by users: criticalOptimized Approximation Algorithm –SNRF criterion
14Signal-to-noise ratio figure (SNRF) Sampled data: function value + noiseError signal:approximation error component + noise componentNoise partShould not be learnedUseful signalShould be reducedWithout a priori knowledge of the noise characteristics of the training data, it is assumed that the training data may come with White Gaussian Noise (WGN) at an unknown level.The error signal is the difference between the approximating function value and the training data.In order to have a clear indication of overfitting, we need to examine this error signal. The error signal contains two components: the approximation error due to the limited learning ability or inaccuracy in approximation using the given number of hidden neurons in the network, and the WGN in the training data. The approximation error shows how much useful signal left unlearned which should be reduced. However, the WGN part shows the noise which we shouldn’t learn.The question of whether the learning should stop or the approximation is good enough becomes the question that: whether there is still useful signal information left to be learned in the error signal or whether the noise part dominates in the error signal.If there is, based on the assumption that the function we try to approximate is continuous and that the noise is WGN, we can estimate the level of signal and noise in the error signal. The ratio of the signal energy level over the noise energy level is defined as SNRF. The SNRF can be pre-calculated for the WGN. The comparison of SNRF of the error signal with that of WGN determines whether WGN dominates in the error signal. If it does, there is little useful information left in the error signal, and the approximation error cannot be reduced anymore.Assumption: continuous function & WGN as noiseSignal-to-noise ratio figure (SNRF):signal energy/noise energyCompare SNRFe and SNRFWGNLearning should stop – ?If there is useful signal left unlearnedIf noise dominates in the error signal
15Signal-to-noise ratio figure (SNRF) Training data and approximating functionError signalThe method to estimate the signal level and noise level in the error signal so that to obtain SNRF is explained in a one-dimensional case first.The figure on the left shows the training data and its existing fitting using quadratic polynomials. The difference between them, the error signal, is shown on the right. Obviously, the error signal doesn’t show the property of WGN.The error signal e contains an approximation error component, which is the useful signal left unlearned, and a noise component. The level of the noise is unknown.How can we measure the signal level and noise level.approximation error component+noise component
16Optimization using SNRF SNRFe< threshold SNRFWGNStart with small network (small # of neurons or layers)Train the MLP etrainCompare SNRFe & SNRFWGNAdd hidden neuronsNoise dominates in the error signal,Little information left unlearned,Learning should stopUsing SNRF, we can estimate the signal level and noise level for the error signal and then quantitatively determine the amount of useful signal information left unlearned. The noise characteristics simply serve as a reference for developing the stopping criterion. When SNRFe is smaller than the threshold about SNRFWGN, it means that the noise dominates in the error signal and there is little information left unlearned, and the learning process with certain number of hidden neurons can be stopped. Otherwise, more hidden neurons have to be used to improve the learning.In the process of optimizing number of hidden neurons of NN, one may start with a network with a small number of hidden neurons. Examine the training error signal and obtain its SNRF. Compare the SNRF with the threshold. If it is lower than the threshold, more hidden neurons can be added until SNRF indicates overfitting. And the stopping criterion can be expressed as that the SNRF of error signal is lower than the threshold.Stopping criterion:SNRFe< threshold SNRFWGN
17Optimization using SNRF Applied in optimizing number of iterations in back-propagation training to avoid overfitting (overtraining)Set the structure of MLPTrain the MLP with back-propagation iteration etrainCompare SNRFe & SNRFWGNKeep training with more iterationsUsing SNRF, we can estimate the signal level and noise level for the error signal and then quantitatively determine the amount of useful signal information left unlearned. The noise characteristics simply serve as a reference for developing the stopping criterion. When SNRFe is smaller than the threshold about SNRFWGN, it means that the noise dominates in the error signal and there is little information left unlearned, and the learning process with certain number of hidden neurons can be stopped. Otherwise, more hidden neurons have to be used to improve the learning.In the process of optimizing number of hidden neurons of NN, one may start with a network with a small number of hidden neurons. Examine the training error signal and obtain its SNRF. Compare the SNRF with the threshold. If it is lower than the threshold, more hidden neurons can be added until SNRF indicates overfitting. And the stopping criterion can be expressed as that the SNRF of error signal is lower than the threshold.
18Software simulation project Prepare the dataData sample along the row: N samplesFeatures along the column: M featuresDesired output in a row vector: N valuesSave “features” and “values” in a training MAT fileHow to recall the functionRun “main_MLP_LS.m”Specify MAT file path and name and MLP parameters in command window.M x N matrix: “Features”1 x N vector: “Values”
19Software simulation project Input the path where data file can be found (C:*): E:\Research\MLP_LSInitial_desired\MLP_LS_package\Input the name of data file (*.mat): mackey_glass_data.matThere are overall 732 samples. How do you like to divide them into training and testing set?Number of training samples: 500Number of testing samples: 232How many layers does MLP have? 3:2:7How many neurons there are on each hidden layer ? 3:1:10What kind of tranfer function you like to have on hidden neurons?0. Linear tranfer function1. Tangent sigmoid2. Logrithmic sigmoid2
20Software simulation project z2y2dW1y1z1b1W2xb2There are 4 types of training algorithms you can choose from. Which type you like to use?1. Least-squared based training (I)2. Least-squared based training with iterative neuron fitting (II)3. Least-squared based training with weighted signal adaptation (III)4. Back-propagation training (BP)1How many iterations you would like to have in the training ? 3How many Monte-Carlo runs you would like to have for the training? 2
21Software simulation project Results:J_train (num_layer, num_neuron)J_test (num_layer, num_neuron)SNRF (num_layer, num_neuron)Present training and testing errors for various configurations of the MLPPresent the optimum configuration found by SNRFPresent the comparison of the results, including errors, network structure
22Software simulation project Typical database and literature surveyFunction approximation & classification dataset“IEEE Neural Networks Council Standards Committee Working Group on Data modeling Benchmarks”“Neural Network Databases and Learning Data”“UCI Machine Learning Repository”Data are normalizedMultiple input, with signal output.For multiple output data, use separate MLPs.Compare results from literature which uses the same dataset (*)