Download presentation

Presentation is loading. Please wait.

Published byQuinn Rutt Modified over 3 years ago

1
Introduction to Non Parametric Statistics Kernel Density Estimation

2
Fewer restrictive assumptions about data and underlying probability distributions. Population distributions may be skewed and multi-modal. Nonparametric Statistics

3
Kernel Density Estimation (KDE) is a non-parametric technique for density estimation in which a known density function (the kernel) is averaged across the observed data points to create a smooth approximation. Kernel Density Estimation (KDE)

4
Density Estimation and Histograms Let b denote the bin-width then the histogram estimation at a point x from a random sample of size n is given by, Two choices have to be made when constructing a histogram: Positioning of the bin edges Bin-width

5
KDE – Smoothing the Histogram Let be a random sample taken from a continuous, univariate density f. The kernel density estimator is given by, K is a function satisfying The function K is referred to as the kernel. h is a positive number, usually called the bandwidth or window width.

6
Kernels Gaussian Epanechnikov Rectangular Triangular Biweight Uniform Cosine Wand M.P. and M.C. Jones (1995), Kernel Smoothing, Monographs on Statistics and Applied Probability 60, Chapman and Hall/CRC, 212 pp. Refer to Table 2.1 Wand and Jones, page 31. … most unimodal densities perform about the same as each other when used as a kernel. Typically K is chosen to be a unimodal PDF. Use the Gaussian kernel.

7
KDE – Based on Five Observations x=c(3, 4.5, 5.0, 8, 9) Kernel density estimate constructed using five observations with the kernel chosen to be the N(0,1) density.

8
hist(x,right=T,freq=F), R-default (a,b] right closed (left-open) hist(x,right=F,freq=F) [a,b) left closed (right-open) x=c(3, 4.5, 5.0, 8, 9) Histogram - Positioning of Bin Edges Area=1

9
Histogram - Bin Width hist(x,breaks=5,right=F,prob=T)hist(x,breaks=2,right=F,prob=T) Area=1

10
"kde" <- function(x,h) { npt=100 r <- max(x) - min(x); xmax <- max(x) + 0.1*r; xmin <- min(x) - 0.1*r n <- length(x) xgrid <- seq(from=xmin, to=xmax, length=npt) f = vector() for (i in 1:npt){ tmp=vector() for (ii in 1:n){ z=(xgrid[i] - x[ii])/h density=dnorm(z) tmp[ii]=density } f[i]=sum(tmp) } f=f/(n*h) lines(xgrid,f,col="grey") } #end function KDE – Numerical Implementation Variable description x = xgrid X = x

11
Bandwidth Estimators Optimal Smoothing Normal Optimal Smoothing Cross-validation Plug-in bandwidths

Similar presentations

OK

Ch 2. Probability Distribution Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Update by B.-H. Kim Summarized by M.H. Kim Biointelligence.

Ch 2. Probability Distribution Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Update by B.-H. Kim Summarized by M.H. Kim Biointelligence.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on earth movements and major landforms in hawaii Ppt on soft skills and personality development Ppt on power line carrier communication Ppt on 9/11 conspiracy theory Ppt on isobars and isotopes definition Ppt on congruent triangles for class 9 Ppt on pakistan air force Ppt on building information modeling system Ppt on indian history quiz Download ppt on properties of triangles