Ch8: Nonparametric Methods

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Principles of Density Estimation
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
INTRODUCTION TO Machine Learning 3rd Edition
K Means Clustering , Nearest Cluster and Gaussian Mixture
Lecture 3 Nonparametric density estimation and classification
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Classification 1: generative and non-parameteric methods Jakob Verbeek January 7, 2011 Course website:
1 E. Fatemizadeh Statistical Pattern Recognition.
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Intro. ANN & Fuzzy Systems Lecture 23 Clustering (4)
Jakob Verbeek December 11, 2009
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.
KNN & Naïve Bayes Hongning Wang
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 7 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
The EM algorithm for Mixture of Gaussians & Classification with Generative models Jakob Verbeek December 2, 2011 Course website:
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Non-Parameter Estimation
INTRODUCTION TO Machine Learning 3rd Edition
Dept. Computer Science & Engineering, Shanghai Jiao Tong University
CH 5: Multivariate Methods
Non-parametric Density Estimation Chapter 4 (Duda et al.)
Instance Based Learning (Adapted from various sources)
Outlier Discovery/Anomaly Detection
Outline Parameter estimation – continued Non-parametric methods.
K Nearest Neighbor Classification
Statistical Learning Dong Liu Dept. EEIS, USTC.
Classification Nearest Neighbor
Nearest-Neighbor Classifiers
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Nonparametric methods Parzen window and nearest neighbor
INTRODUCTION TO Machine Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 16: NONPARAMETRIC TECHNIQUES
Model generalization Brief summary of methods
Parametric Methods Berlin Chen, 2005 References:
Nonparametric density estimation and classification
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Hairong Qi, Gonzalez Family Professor
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Lecture 16. Classification (II): Practical Considerations
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Data Mining Anomaly Detection
Presentation transcript:

Ch8: Nonparametric Methods 8.1 Introduction Nonparametric methods do not assume any a priori parametric form. Nonparametric methods (or memory-based, case-based, or instance-based learning algorithms): Store the training examples. Later, when given an x, find a small number of closest training examples and interpolate from these. The result is locally affected only by nearby examples.

8.2 Nonparametric Density Estimation Univariate case: Given the training set drawn iid from unknown probability density p(x) To estimate p(x), i.e., . Cumulative distribution function (cdf) F(x),

Density function (df): the derivative of the cdf 8.2.1 Histogram Estimator Histogram:

smooth spiky

Naive estimator: : region of influence

Histogram estimator needs to know the origin and h. Naive estimator needs to know only h. is discrete, so is . 8.2.2 Kernel Estimator Kernel function: a smooth weight function, e.g., Gaussian kernel: Kernel estimator (Parzen windows)

h larger  smoother Adaptive methods tailor h as a function of the density around x.

8.2.3 k-Nearest Neighbor Estimator Instead of fixing h, fix # nearby examples k where the distance to the kth closest example to x * Density high, bins small; density low, bins large. The k-nn estimator is not continuous To get a smooth estimator, a kernel function is used

8.3 Generalization to Multivariate Data Kernel density estimator: and Multivariate Gaussian kernel spheric ellipsoid (Euclidean form) (Mahalanobis form takes 1. different scales of dimensions, and 2. correlation into account).

8.4 Nonparametric Classification Objective: Estimate the class-conditional density Given by where The ML estimate of the prior density: The discriminant function:

k-NN estimator: where # neighbors out of the k nearest The volume of hypersphere centered at x with radius , where is the kth nearest neighbor of x: with as the volume of the unit sphere, e.g., From Bayes’ rule:

k-NN classifier assigns the input to the class having most examples among the k neighbors of the input. Example: 1-NN classifier, i.e., the nearest neighbor classifier. It divides the space in the form of a Voronoi tessellation.

8.5 Condensed Nearest Neighbor Time/space complexity of k-NN is O(N). Idea: Find a subset Z of X that is small and is accurate in classifying X. Algorithm of CNN: The algorithm depends on the order of the examples being examined; different subsets may be found.

The black line segments are the Voronoi tessellation, The red line segments form the class discriminant. In the CNN, the examples that do not participate in defining the discriminant are removed.

8.6 Distance-based Classification Find a distance function such that if belong to the same class, distance is small and if they belong to different classes, distance is large. Consider the nearest mean classifier: Parametric approach: Euclidean distance: Mahalanobis distance:

Semiparametric approach: each class is written as a mixture of densities Idea of distance learning: Assume a parametric model and learn its parameters . Example: Mahalanobis distance M: the parameter matrix to be estimated

If the input dimensionality d is high, factor M as , where M is d x d and L is k x d, k < d. Learn L instead of M.

8.7 Outlier Detection Outliers may indicate (a) abnormal behaviors of systems or (b) record errors. Difficulties of outlier detection: Outliers are (a) very few, (b) of many types, and (c) seldom labeled Nonparametric approaches are often used: Idea -- find instances far away from other instances. Example: Local Outlier Factor (LOF) where

The distance between x and its kth nearest neighbor, the set of examples that are in the neighborhood of x, LOF(x)  1, P(x not outlier) increases;  larger, P(x outlier) increases.

8.8 Nonparametric Regression: Given a training sample assume . Nonparametric regression: Given x, i) find its neighborhood N(x) and ii) average the r values in N(x) to calculate There are various ways to defining the neighborhood and averaging in the neighborhood Consider univariate x first.

8.8.1 Running mean smoother Regressogram: Define an origin and a bin width, average the r values in the bin

Running mean smoother: Define a bin symmetric around x and average in there

8.8.2 Kernel smoother -- Give less weights to further points. where K( ) is Gaussian additive models

K-nn smoother: Fix the number of neighbors k. 8.8.3 Running line smoother -- Use the data points in the neighborhood to fit a line. Locally weighted running line smoother (loess): Usde kernel weighting such that distant points have less effect on error.

8.9 How to Choose the Smoothing Parameters h: bin width, kernel spread, k: the number of neighbors. When k or h is small, single instances matter; bias is small, variance is large (undersmoothing). As k or h increases, average over more instances and bias increases but variance decreases (oversmoothing). Smoothing splines:

Cross-validation is used to fine tune k or h.