Ch8: Nonparametric Methods

Ch8: Nonparametric Methods
8.1 Introduction Nonparametric methods do not assume any a priori parametric form. Nonparametric methods (or memory-based, case-based, or instance-based learning algorithms): Store the training examples. Later, when given an x, find a small number of closest training examples and interpolate from these. The result is locally affected only by nearby examples.

8.2 Nonparametric Density Estimation
Univariate case: Given the training set drawn iid from unknown probability density p(x) To estimate p(x), i.e., Cumulative distribution function (cdf) F(x),

Density function (df): the derivative of the cdf
8.2.1 Histogram Estimator Histogram:

smooth spiky

Naive estimator: : region of influence

Histogram estimator needs to know the origin and h.
Naive estimator needs to know only h. is discrete, so is 8.2.2 Kernel Estimator Kernel function: a smooth weight function, e.g., Gaussian kernel: Kernel estimator (Parzen windows)

h larger  smoother Adaptive methods tailor h as a function of the density around x.

8.2.3 k-Nearest Neighbor Estimator
Instead of fixing h, fix # nearby examples k where the distance to the kth closest example to x * Density high, bins small; density low, bins large. The k-nn estimator is not continuous To get a smooth estimator, a kernel function is used

8.3 Generalization to Multivariate Data
Kernel density estimator: and Multivariate Gaussian kernel spheric ellipsoid (Euclidean form) (Mahalanobis form takes 1. different scales of dimensions, and 2. correlation into account).

8.4 Nonparametric Classification
Objective: Estimate the class-conditional density Given by where The ML estimate of the prior density: The discriminant function:

k-NN estimator: where # neighbors out of the k nearest The volume of hypersphere centered at x with radius , where is the kth nearest neighbor of x: with as the volume of the unit sphere, e.g., From Bayes’ rule:

k-NN classifier assigns the input to the class having most examples among the k neighbors of the input. Example: 1-NN classifier, i.e., the nearest neighbor classifier. It divides the space in the form of a Voronoi tessellation.

8.5 Condensed Nearest Neighbor
Time/space complexity of k-NN is O(N). Idea: Find a subset Z of X that is small and is accurate in classifying X. Algorithm of CNN: The algorithm depends on the order of the examples being examined; different subsets may be found.

The black line segments are the Voronoi tessellation,
The red line segments form the class discriminant. In the CNN, the examples that do not participate in defining the discriminant are removed.

8.6 Distance-based Classification
Find a distance function such that if belong to the same class, distance is small and if they belong to different classes, distance is large. Consider the nearest mean classifier: Parametric approach: Euclidean distance: Mahalanobis distance:

Semiparametric approach: each class is written as a
mixture of densities Idea of distance learning: Assume a parametric model and learn its parameters . Example: Mahalanobis distance M: the parameter matrix to be estimated

If the input dimensionality d is high, factor M as
, where M is d x d and L is k x d, k < d. Learn L instead of M.

8.7 Outlier Detection Outliers may indicate (a) abnormal behaviors of systems or (b) record errors. Difficulties of outlier detection: Outliers are (a) very few, (b) of many types, and (c) seldom labeled Nonparametric approaches are often used: Idea -- find instances far away from other instances. Example: Local Outlier Factor (LOF) where

The distance between x and its kth
nearest neighbor, the set of examples that are in the neighborhood of x, LOF(x)  1, P(x not outlier) increases;  larger, P(x outlier) increases.

8.8 Nonparametric Regression:
Given a training sample assume Nonparametric regression: Given x, i) find its neighborhood N(x) and ii) average the r values in N(x) to calculate There are various ways to defining the neighborhood and averaging in the neighborhood Consider univariate x first.

8.8.1 Running mean smoother Regressogram: Define an origin and a bin width, average the r values in the bin

Running mean smoother: Define a bin symmetric around x and average in there

8.8.2 Kernel smoother -- Give less weights to further points.
where K( ) is Gaussian additive models

K-nn smoother: Fix the number of neighbors k.
8.8.3 Running line smoother -- Use the data points in the neighborhood to fit a line. Locally weighted running line smoother (loess): Usde kernel weighting such that distant points have less effect on error.

8.9 How to Choose the Smoothing Parameters
h: bin width, kernel spread, k: the number of neighbors. When k or h is small, single instances matter; bias is small, variance is large (undersmoothing). As k or h increases, average over more instances and bias increases but variance decreases (oversmoothing). Smoothing splines:

Cross-validation is used to fine tune k or h.

Ch8: Nonparametric Methods

Similar presentations

Presentation on theme: "Ch8: Nonparametric Methods"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ch8: Nonparametric Methods

Similar presentations

Presentation on theme: "Ch8: Nonparametric Methods"— Presentation transcript:

Similar presentations

About project

Feedback