CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

CHAPTER 13: Alpaydin: Kernel Machines
Non-Parametric Density Functions Christoph F. Eick 12/24/08.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
CHAPTER 10: Linear Discrimination
INTRODUCTION TO Machine Learning 3rd Edition
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lecture 3 Nonparametric density estimation and classification
Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},
Instance Based Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Kernel methods - overview
統計計算與模擬 政治大學統計系余清祥 2003 年 6 月 9 日 ~ 6 月 10 日 第十六週:估計密度函數
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Instance Based Learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Jakob Verbeek December 11, 2009
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 10: Logistic Regression. Binary classification Two classes Y = {0,1} Goal is to learn how to correctly classify the input into one of these two.
Machine Learning 5. Parametric Methods.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
KNN & Naïve Bayes Hongning Wang
Lecture 7 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
INTRODUCTION TO Machine Learning 3rd Edition
Ch8: Nonparametric Methods
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Non-Parametric Density Functions
K Nearest Neighbor Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Instance Based Learning
INTRODUCTION TO Machine Learning
Parametric Methods Berlin Chen, 2005 References:
Nonparametric density estimation and classification
Presentation transcript:

CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011

Eick/Alpaydin: Non-Parametric Density Estimation 2 Non-Parametric Density Estimation Goal is to obtain a density function: nction nction Parametric (single global model), semiparametric (small number of local models) Nonparametric: Similar inputs have similar outputs Keep the training data;“let the data speak for itself” Given x, find a small number of closest training instances and interpolate from these Aka lazy/memory-based/case-based/instance-based learning

Histograms Histogram  Usually shows the distribution of values of a single variable  Divide the values into bins and show a bar plot of the number of objects in each bin.  The height of each bar indicates the number of objects  Shape of histogram depends on the number of bins Example: Petal Width (10 and 20 bins, respectively)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Density Estimation Given the training set X={x t } t drawn iid from p(x) Divide data into bins of size h, stating from origin x o Histogram: Naive estimator: or Typo corrected on March

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 Origin 0; h(1)=4/16 h(1.25)=1/8

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 h(2)=2/2*8=0.125

7 Gaussian Kernel Estimator Kernel function, e.g., Gaussian kernel: Kernel estimator (Parzen windows): Gaussian Influence Functions in general: Influence of x t on x; h determines how quickly influence decreases as distance between x t and x increases; h is called “width” of kernel. Eick/Alpaydin: Non-Parametric Density Estimation Query point

8 Example: Kernel Density Estimation D={x1,x2,x3,x4} f D Gaussian (x)= influence(x1,x) + influence(x2,x) + influence(x3,x) + influence(x4,x)= =0.78 x1 x2 x3 x4 x y Remark: the density value of y would be larger than the one for x Eick/Alpaydin: Non-Parametric Density Estimation

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9

10 Density Functions for different values of h/  Remark: Eick/Alpaydin: Non-Parametric Density Estimation

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 k-Nearest Neighbor Estimator Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width d k (x), distance to kth closest instance to x

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 Multivariate Data Kernel density estimator Multivariate Gaussian kernel spheric ellipsoid

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14 Nonparametric Classification Estimate p(x|C i ) and use Bayes’ rule Kernel estimator k-NN estimator Skip: Idea use density function for each class

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15 Condensed Nearest Neighbor Time/space complexity of k-NN is O (N) Find a subset Z of X that is small and is accurate in classifying X (Hart, 1968) skip

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16 Condensed Nearest Neighbor Incremental algorithm: Add instance if needed skip

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Nonparametric Regression Aka smoothing models Regressogram Idea: use average of the output variable in a neighborhood of x

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 uses bins to define neighborhoods

19 Running mean smoother Running line smoother Running Mean/Kernel Smoother Kernel smoother where K( ) is Gaussian Additive models (Hastie and Tibshirani, 1990) Idea: weights example inversely by their distance to x.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 Example: Computing ĝ (2) xtxt rtrt Weight running mean smother for x=2, h=2.1 Weight of Kernel Smother computed using influence(x,x t ) Running mean smother prediction: ĝ (2)=(6+11+7)/3 Kernel smoother prediction: ĝ (2)=(6*0.2+11*0.2+7*0.1+3*0.05)/0.55

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22 Idea: Uses influence function to determine weight

23 Smoothness/Smooth Functions depends on a function’s discontinuities and on how quickly its first/second/third/… derivative changes A smooth function is a continuous function whose derivatives change quite slowly Smooth function are frequently preferred as models of systems and decision making  small changes in input result in small changes in output Design: Smooth surfaces are more appealing; e.g. car or sofa design

24 Choosing h/k When k or h is small, single instances matter; bias is small, variance is large (undersmoothing): High complexity As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing): Low complexity h/k large: very few hills/smooth h/k small: a lot of hills; a lot of changes/discontinuities in the first/second/third/… derivative Cross-validation is used to finetune k or h.