Download presentation

Presentation is loading. Please wait.

Published byQuinn Rutt Modified over 2 years ago

1
Introduction to Non Parametric Statistics Kernel Density Estimation

2
Fewer restrictive assumptions about data and underlying probability distributions. Population distributions may be skewed and multi-modal. Nonparametric Statistics

3
Kernel Density Estimation (KDE) is a non-parametric technique for density estimation in which a known density function (the kernel) is averaged across the observed data points to create a smooth approximation. Kernel Density Estimation (KDE)

4
Density Estimation and Histograms Let b denote the bin-width then the histogram estimation at a point x from a random sample of size n is given by, Two choices have to be made when constructing a histogram: Positioning of the bin edges Bin-width

5
KDE – Smoothing the Histogram Let be a random sample taken from a continuous, univariate density f. The kernel density estimator is given by, K is a function satisfying The function K is referred to as the kernel. h is a positive number, usually called the bandwidth or window width.

6
Kernels Gaussian Epanechnikov Rectangular Triangular Biweight Uniform Cosine Wand M.P. and M.C. Jones (1995), Kernel Smoothing, Monographs on Statistics and Applied Probability 60, Chapman and Hall/CRC, 212 pp. Refer to Table 2.1 Wand and Jones, page 31. … most unimodal densities perform about the same as each other when used as a kernel. Typically K is chosen to be a unimodal PDF. Use the Gaussian kernel.

7
KDE – Based on Five Observations x=c(3, 4.5, 5.0, 8, 9) Kernel density estimate constructed using five observations with the kernel chosen to be the N(0,1) density.

8
hist(x,right=T,freq=F), R-default (a,b] right closed (left-open) hist(x,right=F,freq=F) [a,b) left closed (right-open) x=c(3, 4.5, 5.0, 8, 9) Histogram - Positioning of Bin Edges Area=1

9
Histogram - Bin Width hist(x,breaks=5,right=F,prob=T)hist(x,breaks=2,right=F,prob=T) Area=1

10
"kde" <- function(x,h) { npt=100 r <- max(x) - min(x); xmax <- max(x) + 0.1*r; xmin <- min(x) - 0.1*r n <- length(x) xgrid <- seq(from=xmin, to=xmax, length=npt) f = vector() for (i in 1:npt){ tmp=vector() for (ii in 1:n){ z=(xgrid[i] - x[ii])/h density=dnorm(z) tmp[ii]=density } f[i]=sum(tmp) } f=f/(n*h) lines(xgrid,f,col="grey") } #end function KDE – Numerical Implementation Variable description x = xgrid X = x

11
Bandwidth Estimators Optimal Smoothing Normal Optimal Smoothing Cross-validation Plug-in bandwidths

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google