A Robust and Optimally Pruned Extreme Learning Machine

A Robust and Optimally Pruned Extreme Learning Machine
Ananda L. Freire Ajalmar R. Rocha Neto Postgraduate Program in Computer Science Department of Teleinformatics, Federal Institute of Ceará ISDA - December, 2016

Outline Introduction Issues related to ELM network Algorithms
Issues related to ELM network - Robustness Our proposition Algorithms Optimally Pruned Extreme Learning Machine Outlier-Robust Extreme Learning Machine Robust and Optimally Pruned ELM (ROP-ELM) Batch Intrinsic Plasticity Robust and Optimally Pruned with Intrinsic Plasticity ELM Methodology Results Conclusions

Introduction The Extreme Learning Machine (ELM) [1] is:
3 Introduction The Extreme Learning Machine (ELM) [1] is: a single layer feedforward neural network random hidden layer weights output weights calculated as a linear model fast learning period and good generalization performance. [1] Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: Theory and applications. Neurocomp. 70(1-3), 489–501 (2006)

Issues related to ELM network
4 Issues related to ELM network Definition of number of hidden neurons q. Random nature of hidden weights: Definition of number of hidden neurons q. Random nature of hidden weights: Robustness: regression and classification problems are often contaminated by noise It may influence: estimated parameters modeling accuracy ill-conditioned matrix of hidden neurons’ activations H solution of the linear output system numerically unstable high norm of the resulting output weight vector sensitive network to data perturbation ill-conditioned matrix of hidden neurons’ activations H solution of the linear output system numerically unstable high norm of the resulting output weight vector sensitive network to data perturbation

Issues related to ELM network - Robustness
5 Issues related to ELM network - Robustness Two aspects influence the robustness properties in a ELM network: computational robustness: outlier robustness: generally ignored - emphasis on the accuracy of solutions ill-conditioned H solution becomes sensitive to data perturbation the size of output layer weights is relevant for the generalization capability few proposals has been exploring it in recent years use estimation methods less sensitive to outliers then the Ordinary Least Squares (OLS), e.g. M-Estimator [1-4] rank-based Wilcoxon approach [5] l1-norm loss function [6-7]

6 Our proposals Considering that l1-norm is more robust to outliers, produces sparser models than l2-norm [8] and that the study of robust ELM’s architecture design is still in its infancy, we provide two proposals: ROP-ELM: ROPP-ELM: adopts the l2-norm loss function and associates it with a known pruning method, named OP-ELM. to also improve the numerical stability of the solutions, we use a method named Batch Intrinsic Plasticity (BIP), which adapts the activation function’s parameters (slope and bias) to force the output of the hidden neurons to follow an exponential distribution. [8] Balasundaram, S., Gupta, D., Kapil: 1-norm extreme learning machine for regression and multiclass classification using newton method. Neurocomp. 128, 4–14 (2014)

Outlier-robust extreme learning machine (ORELM)
7 Outlier-robust extreme learning machine (ORELM) ORELM is an ELM which has adopted an l1-norm loss function for problems contaminated by outliers. To solve the resulting optimization problem, the OLS algorithm is substituted by Augmented Lagrange Multipliers (ALM) method: The ALM algorithm estimates the optimal solution (E,W) and the Lagrange multiplier λ by iteratively minimizing the augmented Lagrangian function. Admits the following explicit solution: The most computational expensive part of this method is mainly spent on solving the inverse (HTH + (2/Cµ)I), nevertheless, it is pre-calculated before the beginning of the iterations.

ELM improvement: batch intrinsic plasticity (BIP)
8 ELM improvement: batch intrinsic plasticity (BIP) Based on a biological mechanism, it adapts the parameters from the hidden neurons activation function: bias (bi) slope (ai) In order to: achieve more suitable regimes maximization of information transmission and act as a feature regularizer Algorithm: [10] Neumann, K. Reliability of Extreme Learning Machines. Thesis (PhD) Faculty of Technology, Bielefeld University, October 2013.

9 Pruning networks Figure: Optimally pruned extreme learning machine (OP-ELM) [9] main steps. Figure: Robust and Optimally pruned extreme learning machine (ROP-ELM) main steps. Figure: Robust and Optimally pruned with intrinsic plasticity extreme learning machine (ROPP-ELM) main steps. [9] Miche, Y., Sorjamaa, A., Bas, P., Simula, O., Jutten, C., Lendasse, A.: Op-elm: Optimally pruned extreme learning machine. Neural Networks, IEEE Transactions on 21(1), (2010)

Methodology: training and test
10 Methodology: training and test Figure: A 10-fold and 5-fold cross validation combined with grid search scheme. ELM and ORELM Use inner and outer cross validation loops. OP-ELM, ROP-ELM and ROPP-ELM Use only outer cross validation loops.

Methodology: outlier contamination and settings
11 Methodology: outlier contamination and settings Contamination scenario: Configuration: Following Horata’s methodology: contaminating randomly the training data targets; rates: 10%, 20% and 30%; subset K ⊂ 1,...,N of row indexes of D; noise from Uniform distribution: ∆k ∼ U[−1, 1], ∀k ∈ K; resulting contaminated sample: dk = dk + ∆k Hyperbolic tangent as activation function BIP regularization parameter γ = 10−2; desired exponentialdistribution mean µexp = 0.2; ORELM maximum number of iterations is 20; regularization parameter C = 2−40; ORELM Toolbox v2.02 OP-ELM starts with 100 hidden neurons; no internal normalization; OP-ELM Matlab Toolbox v1.1 3.

Results: ranking evaluation
12 Results: ranking evaluation

13 Results

14 Results

15 Results

16 Conclusions We combine a robust ELM, named ORELM with a pruning method in order to achieve robust solutions with smaller and stabler networks: ROP-ELM ROPP-ELM Non robust networks (ELM and OP-ELM) has their performance worsened as the training data is increasingly contaminated by outliers. Although our contributions has test RMSE slightly larger than the ORELM, they offer solutions from 20% to 85% less hidden neurons, while maintaining smaller output weights norm and less training time. ROPP-ELM offers solutions especially suitable for environments with low computational resources. Acknowledgments The authors would like to thank the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for the financial support.

A Robust and Optimally Pruned Extreme Learning Machine

Similar presentations

Presentation on theme: "A Robust and Optimally Pruned Extreme Learning Machine"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Robust and Optimally Pruned Extreme Learning Machine

Similar presentations

Presentation on theme: "A Robust and Optimally Pruned Extreme Learning Machine"— Presentation transcript:

Similar presentations

About project

Feedback