Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth

The big picture Adaptive filters are linear. How do we learn (continuous) nonlinear structures?

A particular approach Assume a parametric model … e.g. neural network Universality: The parametric model should be able to approximate any continuous function. Universal approximation for sufficiently large Nonlinearly map signal to higher dimensional space and... apply a linear filter. nonlinear

It’s difficulty Nonlinear performance surface Can we learn nonlinear structure using knowledge of linear adaptive filtering? Fix the nonlinear mapping, and use linear filtering. How do we choose the mappings? Need to guarantee universal approximation! e.g. A different approach Filter order is

A ‘trick’y solution Optimal filter exists in the span of input data *** Only the inner product matters, not the mapping e.g Mapping is infinite dimensional. Top-down design Output is a projection

Inner product and pd kernel are equivalent Inner product 1.Symmetry, 2.Linearity, 3.Positive definiteness is an inner product in some space space: Linear space with inner product Use pd kernel to implicitly construct nonlinear mapping Positive definite (pd) kernel e.g. or,

How do things work? Mercer decomposition considering Generalization of eigenvalue decomposition in functional space. Take a positive definite kernel Then can be infinite parameters to learn Bottom-up design Nonlinearity is implicit in the choice of kernel.

Functional view We do not explicitly evaluate the mapping. But it is implicitly applied through the kernel function. Need to remember all the input data and the coefficients Feature space Universality is guaranteed through the kernel.

Ridge regression How to find ? Solution Problem How to invert an infinite dimensional matrix Regularization ***

Online learning LMS update rule LMS update rule in feature space How do we compute these? Set to 0

Kernel-LMS Initialize Iterate for is the largest eigenvalue of Unkwown1.Need to choose a kernel 2.Need to select step size 3.Need to store 4.No regularization *** 5. time complexity for each iteration

Functional approximation Kernel should be universal e.g.How to choose

Implementation details LargeSmallChoosing best value of 2. Thumb-rules: Fast but not accurate 1. Cross validation: Accurate but time consuming Limiting network size 1. Importance estimation Close centers are redundant

Self-regularization : Over-fitting parameters to fit samples How to remove it? How does KLMS do it?

Ill-posed-ness Ill-posed-ness appears due to small singular values in the autocorrelation matrix while taking inverse How to remove it? Solve Tikhonov regularization Weight the inverse of the small singular values e.g.

Self-regularization : Well-posed-ness How does KLMS do it? Regularizer on the expected solution However, large singular values might be suppressed. More information on the course website! Username: Password: The stepsize acts as regularizer

Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Similar presentations

Presentation on theme: "Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Similar presentations

Presentation on theme: "Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth."— Presentation transcript:

Similar presentations

About project

Feedback