Download presentation

Presentation is loading. Please wait.

1
**EE 690 Design of Embodied Intelligence**

Least-squares-based Multilayer perceptron training with weighted adaptation Software simulation project EE 690 Design of Embodied Intelligence

2
**Outline Multilayer Perceptron Least-squares based Learning Algorithm**

Weighted Adaptation in training Signal-to-Noise Ratio Figure and Overfitting Software simulation project

3
**Multilayer perceptron (MLP)**

Feedforward (no recurrent connections) network with units arranged in layers Inputs x Outputs z A typical 2-layer multi-layer perceptron contains input layer, hidden layer and output layer. The inputs go into the perceptron, the data go through the connections between input layer and the hidden layer. The weighted summation goes through the neurons on the hidden layer. The hidden neurons implements usually nonlinear transfer functions to bring the nonlinearity of the networks. The data from the hidden neurons go through the connections between the hidden layer and the output layer. The weighted summation of the data from hidden neurons will be the output of the network. The weights are trainable to implement functions.

4
**Multilayer perceptron (MLP)**

Efficient mapping from inputs to outputs Powerful universal function approximation Number of inputs and outputs determined by the data Number of hidden neurons Number of hidden layers MLP outputs inputs The weights are trainable to map the relations between input and output data. The MLP can be a powerful tool in function approximation in the sense that it finds the weights to combine nonlinear basis function. In function approximation, the number of basis function to use affects the learning ability of the function approximator. The number of the input and output neurons are determined by the dimensions of the data, so that only the number of hidden neurons is left to be decided by users. And the number of hidden neurons, according to number of basis functions to use in approximation, affects the fitting accuracy. In such sense, it is a critical parameter for NNs.

5
**Multilayer Perceptron Learning**

Back-propagation (BP) training algorithm: how much each weight is responsible for the error signal BP has two phases: Forward pass phase: feedforward propagation of input signals through network Backward pass phase: propagates the error backwards through network hidden layer input layer output layer

6
**Multilayer Perceptron Learning**

Backward Pass We want to know how to modify weights in order to decrease E. Use gradient descent: Gradient-based adjustment could go to local minima Time-consuming due to large number of learning steps and the step size needs to be configured

7
**Least-squares based Learning Algorithm**

Least-squared fit (LSF): to obtain the minimum sum of squared error For underdetermined problem, LSF finds the solution with the minimum SSE For overdetermined problem, pseudo-inverse finds the solution with minimum norm Can be applied in the optimization for weights or signals on the layers Optimized weights Optimized signals

8
**Least-squares based Learning Algorithm (I)**

Start with desired output signal back-propagation signals optimization Propagation of the desired outputs back through layers Optimization of the weights between layers (1). y2=f -1(z2), scale y1 to (-1, 1). (2). Based on W2, b2: W2.z1=y2-b2. (3). y1=f-1(z1), scale y1 to (-1, 1). (4). Optimize W1, b1 to satisfy W1.x-b1=y1. (5). Evaluate z1, y1 using the new W1 and bias b1. (6). Optimize W2, b2 to satisfy W2.z1+b2=y2. (7). Evaluate z2, y2 using the new W2 and bias b2. (8). Evaluate the MSE z2 y2 d W1 y1 z1 b1 W2 x b2

9
**Least-squares based Learning Algorithm (I)**

Weights optimization with weighted LSF The location of x on the transfer function determines its effect on output signal of this layer dy/dx weighting term in LSF Δy Optimize W1, b1 to satisfy W1.x=y1-b1 Δy Δx Δx Weighted LSF

10
**Least-squares based Learning Algorithm (II)**

II. Weights optimization with iterative fitting W1 can be further adjusted based on the output error x Each hidden neuron: basis function Start with the 1st hidden neurons, and continue to other neurons as long as eout exists

11
**Least-squares based Learning Algorithm (III)**

III. Start with input feedforward weights optimization Propagation of the inputs forward through layers Optimization of the weights between layers and signals on layers (1). Evaluate z1, y1 using the initial W1 and bias b1. (2). y2=f -1(d). (3). Optimize W2, b2 to satisfy W2.z1+b2=y2. (4). Based on W2, b2, optimize z1 to satisfy W2.z1-b2=y2. (5). y1=f-1(z1). (6). Optimize W1, b1 to satisfy W1.x+b1=y1. (7). Evaluate y1, z1, y2, z2 using the new W1,W2 and bias b1,b2. (8). Evaluate the MSE z2 y2 d W1 y1 z1 b1 W2 x b2

12
**Least-squares based Learning Algorithm (III)**

Signal optimization with weighted adaptation The location of x on the transfer function determines how much the signal can be changed y x

13
Overfitting problem Learning algorithm can adapt MLP to fit into the training data. For the noisy training data, how well we should learn into the data? Overfitting Number of hidden neurons Number of layers affect the training accuracy, determined by users: critical Optimized Approximation Algorithm –SNRF criterion

14
**Signal-to-noise ratio figure (SNRF)**

Sampled data: function value + noise Error signal: approximation error component + noise component Noise part Should not be learned Useful signal Should be reduced Without a priori knowledge of the noise characteristics of the training data, it is assumed that the training data may come with White Gaussian Noise (WGN) at an unknown level. The error signal is the difference between the approximating function value and the training data. In order to have a clear indication of overfitting, we need to examine this error signal. The error signal contains two components: the approximation error due to the limited learning ability or inaccuracy in approximation using the given number of hidden neurons in the network, and the WGN in the training data. The approximation error shows how much useful signal left unlearned which should be reduced. However, the WGN part shows the noise which we shouldn’t learn. The question of whether the learning should stop or the approximation is good enough becomes the question that: whether there is still useful signal information left to be learned in the error signal or whether the noise part dominates in the error signal. If there is, based on the assumption that the function we try to approximate is continuous and that the noise is WGN, we can estimate the level of signal and noise in the error signal. The ratio of the signal energy level over the noise energy level is defined as SNRF. The SNRF can be pre-calculated for the WGN. The comparison of SNRF of the error signal with that of WGN determines whether WGN dominates in the error signal. If it does, there is little useful information left in the error signal, and the approximation error cannot be reduced anymore. Assumption: continuous function & WGN as noise Signal-to-noise ratio figure (SNRF): signal energy/noise energy Compare SNRFe and SNRFWGN Learning should stop – ? If there is useful signal left unlearned If noise dominates in the error signal

15
**Signal-to-noise ratio figure (SNRF)**

Training data and approximating function Error signal The method to estimate the signal level and noise level in the error signal so that to obtain SNRF is explained in a one-dimensional case first. The figure on the left shows the training data and its existing fitting using quadratic polynomials. The difference between them, the error signal, is shown on the right. Obviously, the error signal doesn’t show the property of WGN. The error signal e contains an approximation error component, which is the useful signal left unlearned, and a noise component. The level of the noise is unknown. How can we measure the signal level and noise level. approximation error component + noise component

16
**Optimization using SNRF**

SNRFe< threshold SNRFWGN Start with small network (small # of neurons or layers) Train the MLP etrain Compare SNRFe & SNRFWGN Add hidden neurons Noise dominates in the error signal, Little information left unlearned, Learning should stop Using SNRF, we can estimate the signal level and noise level for the error signal and then quantitatively determine the amount of useful signal information left unlearned. The noise characteristics simply serve as a reference for developing the stopping criterion. When SNRFe is smaller than the threshold about SNRFWGN, it means that the noise dominates in the error signal and there is little information left unlearned, and the learning process with certain number of hidden neurons can be stopped. Otherwise, more hidden neurons have to be used to improve the learning. In the process of optimizing number of hidden neurons of NN, one may start with a network with a small number of hidden neurons. Examine the training error signal and obtain its SNRF. Compare the SNRF with the threshold. If it is lower than the threshold, more hidden neurons can be added until SNRF indicates overfitting. And the stopping criterion can be expressed as that the SNRF of error signal is lower than the threshold. Stopping criterion: SNRFe< threshold SNRFWGN

17
**Optimization using SNRF**

Applied in optimizing number of iterations in back-propagation training to avoid overfitting (overtraining) Set the structure of MLP Train the MLP with back-propagation iteration etrain Compare SNRFe & SNRFWGN Keep training with more iterations Using SNRF, we can estimate the signal level and noise level for the error signal and then quantitatively determine the amount of useful signal information left unlearned. The noise characteristics simply serve as a reference for developing the stopping criterion. When SNRFe is smaller than the threshold about SNRFWGN, it means that the noise dominates in the error signal and there is little information left unlearned, and the learning process with certain number of hidden neurons can be stopped. Otherwise, more hidden neurons have to be used to improve the learning. In the process of optimizing number of hidden neurons of NN, one may start with a network with a small number of hidden neurons. Examine the training error signal and obtain its SNRF. Compare the SNRF with the threshold. If it is lower than the threshold, more hidden neurons can be added until SNRF indicates overfitting. And the stopping criterion can be expressed as that the SNRF of error signal is lower than the threshold.

18
**Software simulation project**

Prepare the data Data sample along the row: N samples Features along the column: M features Desired output in a row vector: N values Save “features” and “values” in a training MAT file How to recall the function Run “main_MLP_LS.m” Specify MAT file path and name and MLP parameters in command window. M x N matrix: “Features” 1 x N vector: “Values”

19
**Software simulation project**

Input the path where data file can be found (C:*): E:\Research\MLP_LSInitial_desired\MLP_LS_package\ Input the name of data file (*.mat): mackey_glass_data.mat There are overall 732 samples. How do you like to divide them into training and testing set? Number of training samples: 500 Number of testing samples: 232 How many layers does MLP have? 3:2:7 How many neurons there are on each hidden layer ? 3:1:10 What kind of tranfer function you like to have on hidden neurons? 0. Linear tranfer function 1. Tangent sigmoid 2. Logrithmic sigmoid 2

20
**Software simulation project**

z2 y2 d W1 y1 z1 b1 W2 x b2 There are 4 types of training algorithms you can choose from. Which type you like to use? 1. Least-squared based training (I) 2. Least-squared based training with iterative neuron fitting (II) 3. Least-squared based training with weighted signal adaptation (III) 4. Back-propagation training (BP) 1 How many iterations you would like to have in the training ? 3 How many Monte-Carlo runs you would like to have for the training? 2

21
**Software simulation project**

Results: J_train (num_layer, num_neuron) J_test (num_layer, num_neuron) SNRF (num_layer, num_neuron) Present training and testing errors for various configurations of the MLP Present the optimum configuration found by SNRF Present the comparison of the results, including errors, network structure

22
**Software simulation project**

Typical database and literature survey Function approximation & classification dataset “IEEE Neural Networks Council Standards Committee Working Group on Data modeling Benchmarks” “Neural Network Databases and Learning Data” “UCI Machine Learning Repository” Data are normalized Multiple input, with signal output. For multiple output data, use separate MLPs. Compare results from literature which uses the same dataset (*)

Similar presentations

OK

S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks 0909.560.01/0909.454.01 Fall 2004 Shreekanth Mandayam ECE Department Rowan University.

S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks 0909.560.01/0909.454.01 Fall 2004 Shreekanth Mandayam ECE Department Rowan University.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google