Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal Pokharel, Jose Principe CNEL, University of Florida weifeng@cnel.ufl.edu Acknowledgment: This work was partially supported by NSF grant ECS-0300340 and ECS-0601271.
Outlines One framework Two algorithms Convergence analysis RA-RBF-1 RA-RBF-2 (kernel least-mean-square) Convergence analysis Well-posedness analysis Experiments
Learning problem Desired signal D Input signal U Problem statement find a function f in a hypothesis space (a reproducing kernel Hilbert space) H, such that the following empirical risk is minimized
Radial Basis Function Network By regularization theory, the well-known solution is where the coefficients satisfy the following linear equation G is the Gram matrix. The RBF network boils down to a matrix inversion problem
A general learning model Loop over the following iteration start from an initial estimation ft-1 then test this estimation on the training data for deviation measures {ei, i} improve the estimation as ft by combining the previous estimation ft-1 and the deviation {ei, i} End loop if | ft-1 – ft | < є
Two algorithms: RA-RBF-1 Algorithm 1: RA-RBF-1 Initialization: learning step: loop over convergence { 1. evaluate network output at every training point 2. compute error 3. update estimate }
Two algorithms: RA-RBF-2 Algorithm 2: RA-RBF-2 Initialization learning step: loop over input-output pairs (ut, yt) { 1. evaluate network output at the present point 2. computer present error 3. improve the estimate }
Similarity and difference recursive RBF network structure use the error directly to compose the network RA-RBF-2 is online whereas RA-RBF-1 not RA-RBF-2 uses the ‘apriori’ error whereas RA-RBF-2 uses the ‘global’ error information
Convergence of RA-RBF-1 Theorem 1: The sufficient and necessary condition for the RA-RBF-1 to converge is: where is the largest eigenvalue of G.
Convergence of RA-RBF-2 RA-RBF-2 is the least-mean-square algorithm in the RKHS, so it is also named kernel LMS (KLMS). Mercer’s theorem is a nonlinear mapping and is the transformed feature vector lying in the feature space F.
Convergence of RA-RBF-2 (cont’d) Denote the weight vector in F by .
Convergence of RA-RBF-2 (cont’d) Theorem 2: By the small-step-size theory, the RA-RBF-2 (KLMS) converges if where is the largest eigenvalue of the auto-correlation matrix
Well-posedness of RA-RBF-1 Theorem 3: The RA-RBF-1 converges uniquely to the following regularized RBF solution The reciprocal of the stepsize serves as the regularization parameter.
Well-posedness of RA-RBF-2 Theorem 4: Under the H∞ stable condition, the norm of the apriori errors in the RA-RBF-2 and further the norm of the solution are upper-bounded. Assume the transformed data in the feature space satisfy the following multiple linear regression model
Well-posedness of RA-RBF-2 (cont’d) Further the solution norm where is the largest eigen-value of the Gram matrix G. The significance of an upper bound for the solution norm is well studied by Poggio and Girosi in the context of regularization network theory.
Relation to resource allocating network (RAN) and online kernel learning (OKL) RAN and OKL are variants of the proposed learning model RA-RBF-2 is a special case of RAN and OKL. The OKL employs explicit regularization The understanding here about the well-posedness of RA-RBF-2 brings new insights into the existing two algorithms.
Simulation: Chaotic signal prediction Mackey-Glass chaotic time series with parameter t=30 time embedding: 10 500 points training data 100 points test data Gaussian noise: zero mean, 0.1 variance Kernel width: 1
Learning curve of RA-RBF-1
Learning curves of RA-RBF-2, LMS, OKL
Results
Novelty criterion The novelty criterion used in RAN can be employed in the RA-RBF-2 (KLMS) The advantage More sparse Better generalization Simple computation
Performance using novelty criterion TABLE II Predication performance for KLMS with novelty criterion (ε, δ) Algorithms KLMS (0.2, 0.7) (0.1, 0.5) (0.08, 0.3) (0.05, 0.1) Training MSE 0.018 0.057 0.037 0.020 0.019 Test MSE 0.049 0.034 0.021 Network Size 500 19 81 290 324
Conclusions Proposed two recursively adapted RBF networks Theoretically explained the convergence properties of the recursively adapted RBF networks Theoretically explained the well-posedness of the recursively adapted neural networks Established connections between resource allocating network and online kernel learning with least-mean-square algorithm