Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.

Similar presentations


Presentation on theme: "CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011."— Presentation transcript:

1 CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011

2 Eick/Alpaydin: Non-Parametric Density Estimation 2 Non-Parametric Density Estimation Goal is to obtain a density function: http://en.wikipedia.org/wiki/Probability_density_fu nction http://en.wikipedia.org/wiki/Probability_density_fu nction Parametric (single global model), semiparametric (small number of local models) Nonparametric: Similar inputs have similar outputs Keep the training data;“let the data speak for itself” Given x, find a small number of closest training instances and interpolate from these Aka lazy/memory-based/case-based/instance-based learning

3 Histograms Histogram  Usually shows the distribution of values of a single variable  Divide the values into bins and show a bar plot of the number of objects in each bin.  The height of each bar indicates the number of objects  Shape of histogram depends on the number of bins Example: Petal Width (10 and 20 bins, respectively)

4 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Density Estimation Given the training set X={x t } t drawn iid from p(x) Divide data into bins of size h, stating from origin x o Histogram: Naive estimator: or Typo corrected on March 5 2011

5 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 Origin 0; h(1)=4/16 h(1.25)=1/8

6 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 h(2)=2/2*8=0.125

7 7 Gaussian Kernel Estimator Kernel function, e.g., Gaussian kernel: Kernel estimator (Parzen windows): Gaussian Influence Functions in general: Influence of x t on x; h determines how quickly influence decreases as distance between x t and x increases; h is called “width” of kernel. Eick/Alpaydin: Non-Parametric Density Estimation Query point

8 8 Example: Kernel Density Estimation D={x1,x2,x3,x4} f D Gaussian (x)= influence(x1,x) + influence(x2,x) + influence(x3,x) + influence(x4,x)= 0.04+0.06+0.08+0.6=0.78 x1 x2 x3 x4 x 0.6 0.08 0.06 0.04 y Remark: the density value of y would be larger than the one for x Eick/Alpaydin: Non-Parametric Density Estimation

9 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9

10 10 Density Functions for different values of h/  Remark: Eick/Alpaydin: Non-Parametric Density Estimation

11 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 k-Nearest Neighbor Estimator Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width d k (x), distance to kth closest instance to x

12 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12

13 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 Multivariate Data Kernel density estimator Multivariate Gaussian kernel spheric ellipsoid

14 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14 Nonparametric Classification Estimate p(x|C i ) and use Bayes’ rule Kernel estimator k-NN estimator Skip: Idea use density function for each class

15 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15 Condensed Nearest Neighbor Time/space complexity of k-NN is O (N) Find a subset Z of X that is small and is accurate in classifying X (Hart, 1968) skip

16 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16 Condensed Nearest Neighbor Incremental algorithm: Add instance if needed skip

17 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Nonparametric Regression Aka smoothing models Regressogram Idea: use average of the output variable in a neighborhood of x

18 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 uses bins to define neighborhoods

19 19 Running mean smoother Running line smoother Running Mean/Kernel Smoother Kernel smoother where K( ) is Gaussian Additive models (Hastie and Tibshirani, 1990) Idea: weights example inversely by their distance to x.

20 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 Example: Computing ĝ (2) xtxt rtrt Weight running mean smother for x=2, h=2.1 Weight of Kernel Smother computed using influence(x,x t ) 1610.2 31110.2 4710.1 5300.05 Running mean smother prediction: ĝ (2)=(6+11+7)/3 Kernel smoother prediction: ĝ (2)=(6*0.2+11*0.2+7*0.1+3*0.05)/0.55

21 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21

22 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22 Idea: Uses influence function to determine weight

23 23 Smoothness/Smooth Functions depends on a function’s discontinuities and on how quickly its first/second/third/… derivative changes A smooth function is a continuous function whose derivatives change quite slowly Smooth function are frequently preferred as models of systems and decision making  small changes in input result in small changes in output Design: Smooth surfaces are more appealing; e.g. car or sofa design

24 24 Choosing h/k When k or h is small, single instances matter; bias is small, variance is large (undersmoothing): High complexity As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing): Low complexity h/k large: very few hills/smooth h/k small: a lot of hills; a lot of changes/discontinuities in the first/second/third/… derivative Cross-validation is used to finetune k or h.


Download ppt "CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011."

Similar presentations


Ads by Google