Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Introduction to Support Vector Machines (SVM)
Chapter Outline 3.1 Introduction

Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.
Support Vector Machines
Pattern Recognition and Machine Learning: Kernel Methods.
Hazırlayan NEURAL NETWORKS Least Squares Estimation PROF. DR. YUSUF OYSAL.
Ch11 Curve Fitting Dr. Deshi Ye
Chapter 9 Perceptrons and their generalizations. Rosenblatt ’ s perceptron Proofs of the theorem Method of stochastic approximation and sigmoid approximation.
Pattern Recognition and Machine Learning
280 SYSTEM IDENTIFICATION The System Identification Problem is to estimate a model of a system based on input-output data. Basic Configuration continuous.
Radial Basis Functions
I welcome you all to this presentation On: Neural Network Applications Systems Engineering Dept. KFUPM Imran Nadeem & Naveed R. Butt &
Curve-Fitting Regression
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
Radial Basis Function Networks
RLSELE Adaptive Signal Processing 1 Recursive Least-Squares (RLS) Adaptive Filters.
An Introduction to Support Vector Machines Martin Law.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Radial Basis Function Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Correntropy as a similarity measure Weifeng Liu, P. P. Pokharel, Jose Principe Computational NeuroEngineering Laboratory University of Florida
Radial Basis Function Networks:
Support Vector Machine With Adaptive Parameters in Financial Time Series Forecasting by L. J. Cao and Francis E. H. Tay IEEE Transactions On Neural Networks,
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Curve-Fitting Regression
CHAPTER 4 Adaptive Tapped-delay-line Filters Using the Least Squares Adaptive Filtering.
1 Adaptive Control Neural Networks 13(2000): Neural net based MRAC for a class of nonlinear plants M.S. Ahmed.
An Introduction to Support Vector Machines (M. Law)
1. 2  A Hilbert space H is a real or complex inner product space that is also a complete metric space with respect to the distance function induced.
Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.
ADALINE (ADAptive LInear NEuron) Network and
Online Kernel Learning Jose C. Principe Computational NeuroEngineering Laboratory (CNEL) University of Florida
LMS Algorithm in a Reproducing Kernel Hilbert Space Weifeng Liu, P. P. Pokharel, J. C. Principe Computational NeuroEngineering Laboratory, University of.
State-Space Recursive Least Squares with Adaptive Memory College of Electrical & Mechanical Engineering National University of Sciences & Technology (NUST)
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Computacion Inteligente Least-Square Methods for System Identification.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Tijl De Bie John Shawe-Taylor ECS, ISIS, University of Southampton
Chapter 7. Classification and Prediction
Ch 12. Continuous Latent Variables ~ 12
Neural Networks Winter-Spring 2014
Radial Basis Function G.Anuradha.
Assoc. Prof. Dr. Peerapol Yuvapoositanon
Computational Intelligence
Neuro-Computing Lecture 4 Radial Basis Function Network
Using Artificial Neural Networks and Support Vector Regression to Model the Lyapunov Exponent Adam Maus.
By Viput Subharngkasen
Computational Intelligence
Biointelligence Laboratory, Seoul National University
Identification of Wiener models using support vector regression
ERROR ENTROPY, CORRENTROPY AND M-ESTIMATION
Generally Discriminant Analysis
The loss function, the normal equation,
Introduction to Radial Basis Function Networks
Mathematical Foundations of BME Reza Shadmehr
Neural networks (1) Traditional multi-layer perceptrons
Computational Intelligence
NONLINEAR AND ADAPTIVE SIGNAL ESTIMATION
Prediction Networks Prediction A simple example (section 3.7.3)
NONLINEAR AND ADAPTIVE SIGNAL ESTIMATION
Computational Intelligence
Presentation transcript:

Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal Pokharel, Jose Principe CNEL, University of Florida weifeng@cnel.ufl.edu Acknowledgment: This work was partially supported by NSF grant ECS-0300340 and ECS-0601271.

Outlines One framework Two algorithms Convergence analysis RA-RBF-1 RA-RBF-2 (kernel least-mean-square) Convergence analysis Well-posedness analysis Experiments

Learning problem Desired signal D Input signal U Problem statement find a function f in a hypothesis space (a reproducing kernel Hilbert space) H, such that the following empirical risk is minimized

Radial Basis Function Network By regularization theory, the well-known solution is where the coefficients satisfy the following linear equation G is the Gram matrix. The RBF network boils down to a matrix inversion problem

A general learning model Loop over the following iteration start from an initial estimation ft-1 then test this estimation on the training data for deviation measures {ei, i} improve the estimation as ft by combining the previous estimation ft-1 and the deviation {ei, i} End loop if | ft-1 – ft | < є

Two algorithms: RA-RBF-1 Algorithm 1: RA-RBF-1 Initialization: learning step: loop over convergence { 1. evaluate network output at every training point 2. compute error 3. update estimate }

Two algorithms: RA-RBF-2 Algorithm 2: RA-RBF-2 Initialization learning step: loop over input-output pairs (ut, yt) { 1. evaluate network output at the present point 2. computer present error 3. improve the estimate }

Similarity and difference recursive RBF network structure use the error directly to compose the network RA-RBF-2 is online whereas RA-RBF-1 not RA-RBF-2 uses the ‘apriori’ error whereas RA-RBF-2 uses the ‘global’ error information

Convergence of RA-RBF-1 Theorem 1: The sufficient and necessary condition for the RA-RBF-1 to converge is: where is the largest eigenvalue of G.

Convergence of RA-RBF-2 RA-RBF-2 is the least-mean-square algorithm in the RKHS, so it is also named kernel LMS (KLMS). Mercer’s theorem is a nonlinear mapping and is the transformed feature vector lying in the feature space F.

Convergence of RA-RBF-2 (cont’d) Denote the weight vector in F by . 

Convergence of RA-RBF-2 (cont’d) Theorem 2: By the small-step-size theory, the RA-RBF-2 (KLMS) converges if where is the largest eigenvalue of the auto-correlation matrix

Well-posedness of RA-RBF-1 Theorem 3: The RA-RBF-1 converges uniquely to the following regularized RBF solution The reciprocal of the stepsize serves as the regularization parameter.

Well-posedness of RA-RBF-2 Theorem 4: Under the H∞ stable condition, the norm of the apriori errors in the RA-RBF-2 and further the norm of the solution are upper-bounded. Assume the transformed data in the feature space satisfy the following multiple linear regression model

Well-posedness of RA-RBF-2 (cont’d) Further the solution norm where is the largest eigen-value of the Gram matrix G. The significance of an upper bound for the solution norm is well studied by Poggio and Girosi in the context of regularization network theory.

Relation to resource allocating network (RAN) and online kernel learning (OKL) RAN and OKL are variants of the proposed learning model RA-RBF-2 is a special case of RAN and OKL. The OKL employs explicit regularization The understanding here about the well-posedness of RA-RBF-2 brings new insights into the existing two algorithms.

Simulation: Chaotic signal prediction Mackey-Glass chaotic time series with parameter t=30 time embedding: 10 500 points training data 100 points test data Gaussian noise: zero mean, 0.1 variance Kernel width: 1

Learning curve of RA-RBF-1

Learning curves of RA-RBF-2, LMS, OKL

Results

Novelty criterion The novelty criterion used in RAN can be employed in the RA-RBF-2 (KLMS) The advantage More sparse Better generalization Simple computation

Performance using novelty criterion TABLE II Predication performance for KLMS with novelty criterion (ε, δ) Algorithms KLMS (0.2, 0.7) (0.1, 0.5) (0.08, 0.3) (0.05, 0.1) Training MSE 0.018 0.057 0.037 0.020 0.019 Test MSE 0.049 0.034 0.021 Network Size 500 19 81 290 324

Conclusions Proposed two recursively adapted RBF networks Theoretically explained the convergence properties of the recursively adapted RBF networks Theoretically explained the well-posedness of the recursively adapted neural networks Established connections between resource allocating network and online kernel learning with least-mean-square algorithm