Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kernel methods - overview

Similar presentations


Presentation on theme: "Kernel methods - overview"— Presentation transcript:

1 Kernel methods - overview
Kernel smoothers Local regression Kernel density estimation Radial basis functions Data Mining and Statistical Learning

2 Data Mining and Statistical Learning - 2008
Introduction Kernel methods are regression techniques used to estimate a response function from noisy data Properties: Different models are fitted at each query point, and only those observations close to that point are used to fit the model The resulting function is smooth The models require only a minimum of training Data Mining and Statistical Learning

3 A simple one-dimensional kernel smoother
where Data Mining and Statistical Learning

4 Kernel methods, splines and ordinary least squares regression (OLS)
OLS: A single model is fitted to all data Splines: Different models are fitted to different subintervals (cuboids) of the input domain Kernel methods: Different models are fitted at each query point Data Mining and Statistical Learning

5 Kernel-weighted averages and moving averages
The Nadaraya-Watson kernel-weighted average where  indicates the window size and the function D shows how the weights change with distance within this window The estimated function is smooth! K-nearest neighbours The estimated function is piecewise constant! Data Mining and Statistical Learning

6 Examples of one-dimesional kernel smoothers
Epanechnikov kernel Tri-cube kernel Data Mining and Statistical Learning

7 Issues in kernel smoothing
The smoothing parameter λ has to be defined When there are ties at xi : Compute an average y value and introduce weights representing the number of points Boundary issues Varying density of observations: bias is constant the variance is inversely proportional to the density Data Mining and Statistical Learning

8 Boundary effects of one-dimensional kernel smoothers
Locally-weighted averages can be badly biased on the boundaries if the response function has a significant slope apply local linear regression Data Mining and Statistical Learning

9 Local linear regression
Find the intercept and slope parameters solving The solution is a linear combination of yi: Data Mining and Statistical Learning

10 Kernel smoothing vs local linear regression
Solve the minimization problem Local linear regression Data Mining and Statistical Learning

11 Properties of local linear regression
Automatically modifies the kernel weights to correct for bias Bias depends only on the terms of order higher than one in the expansion of f. Data Mining and Statistical Learning

12 Local polynomial regression
Fitting polynomials instead of straight lines Behavior of estimated response function: Data Mining and Statistical Learning

13 Polynomial vs local linear regression
Advantages: Reduces the ”Trimming of hills and filling of valleys” Disadvantages: Higher variance (tails are more wiggly) Data Mining and Statistical Learning

14 Selecting the width of the kernel
Bias-Variance tradeoff: Selecting narrow window leads to high variance and low bias whilst selecting wide window leads to high bias and low variance. Data Mining and Statistical Learning

15 Selecting the width of the kernel
Automatic selection ( cross-validation) Fixing the degrees of freedom Data Mining and Statistical Learning

16 Data Mining and Statistical Learning - 2008
Local regression in RP The one-dimensional approach is easily extended to p dimensions by Using the Euclidian norm as a measure of distance in the kernel. Modifying the polynomial Data Mining and Statistical Learning

17 Data Mining and Statistical Learning - 2008
Local regression in RP ”The curse of dimensionality” The fraction of points close to the boundary of the input domain increases with its dimension Observed data do not cover the whole input domain Data Mining and Statistical Learning

18 Structured local regression models
Structured kernels (standardize each variable) Note: A is positive semidefinite Data Mining and Statistical Learning

19 Structured local regression models
Structured regression functions ANOVA decompositions (e.g., additive models) Backfitting algorithms can be used Varying coefficient models (partition X) INSERT FORMULA 6.17 Data Mining and Statistical Learning

20 Structured local regression models
Varying coefficient models (example) Data Mining and Statistical Learning

21 Data Mining and Statistical Learning - 2008
Local methods Assumption: model is locally linear ->maximize the log-likelihood locally at x0: Autoregressive time series. yt=β0+β1yt-1+…+ βkyt-k+et -> yt=ztT β+et. Fit by local least-squares with kernel K(z0,zt) Data Mining and Statistical Learning

22 Kernel density estimation
Straightforward estimates of the density are bumpy Instead, Parzen’s smooth estimate is preferred: Normally, Gaussian kernels are used Data Mining and Statistical Learning

23 Radial basis functions and kernels
Using the idea of basis expansion, we treat kernel functions as basis functions: where ξj –prototype parameter, λj-scale parameter Data Mining and Statistical Learning

24 Radial basis functions and kernels
Choosing the parameters: Estimate {λj, ξj } separately from βj (often by using the distribution of X alone) and solve least-squares. Data Mining and Statistical Learning


Download ppt "Kernel methods - overview"

Similar presentations


Ads by Google