Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml2e Lecture Slides for 1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 2

Parametric Estimation Parametric (single global model) Advantage: It reduces the problem of estimating a probability density function (pdf), discriminant, or regression function to estimating the values of a small number of parameters. Disadvantage: This assumption does not always hold and we may incur a large error if it does not. Semiparametric (small number of local models) Mixture model (Chap 7) The density is written as a disjunction of a small number of parametric models. 3

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Nonparametric Estimation Assumption: Similar inputs have similar outputs Functions (pdf, discriminant, regression) change smoothly Keep the training data;“let the data speak for itself” Given x, find a small number of closest training instances and interpolate from these Nonparametric methods are also called memory-based or instance-based learning algorithms. 4

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Density Estimation Given the training set X ={x t } t drawn iid (independent and identically distributed) from p(x) Divide data into bins of size h Histogram estimator: (Figure 8.1) PS. In probability theory and statistics, a sequence or other collection of random variables is independent and identically distributed (i.i.d.) if each random variable has the same probability distribution as the others and all are mutually independent. (http://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables) 5

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 6 Histogram estimator

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Density Estimation Given the training set X ={x t } t drawn iid from p(x) x is always at the center of a bin of size h Naive estimator: or (Figure 8.2) 7

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) h=0.5 h=1 Naïve estimator: h=2 8 Naive estimator:

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Kernel Estimator Kernel function, e.g., Gaussian kernel: Kernel estimator (Parzen windows): Figure 8.3 If K is Gaussian, then will be smooth having all the derivatives. 9

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10 Kernel estimator

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) k-Nearest Neighbor Estimator Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width d k (x): distance to kth closest instance to x 11

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12 k-Nearest Neighbor Estimator

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Generalization to Multivariate Data Kernel density estimator with the requirement that Multivariate Gaussian kernel spheric ellipsoid 13

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Nonparametric Classification Estimate p(x| C i ) and use Bayes’ rule Kernel estimator The discriminant function 14

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Nonparametric Classification k-nn estimator For the special case of k-nn estimator where k i : the number of neighbors out of the k nearest that belong to C i V k (x) : the volume of the d-dimensional hypersphere centered at x, with radius c d : the volume of the unit sphere in d dimensions For example, 15

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Nonparametric Classification k-nn estimator From Then 16

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Condensed ( 壓縮的 ) Nearest Neighbor Time/space complexity of k-NN is O (N) Find a subset Z of X that is small and is accurate in classifying X (Hart, 1968) Error function The error on X storing Z The cardinality of Z 17

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Condensed Nearest Neighbor Incremental algorithm: Add instance if needed A greedy method and a local search The results depend on the order of the training instances. 18

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Nonparametric Regression Smoothing models: Nonparametric regression estimator also called a smoother Running mean smoother regressogram vs naive estimator Kernel smoother Running line smoother Regressogram: Figure 8.7 19

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 20 Regressogram

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Running mean smoother: naive estimator Figure 8.8 Running Mean Smoother 22

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Kernel Smoother Kernel smoother: Figure 8.9 where K( ) : Gaussian kernel 24

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Running Line Smoother Instead of taking an average and giving a constant fit at a point, we can take into account one more term in the Taylor expansion and calculate a linear fit. F(x) = F’(a)/1!+F’’(a) (x-a)/2!+ F’’’(a) (x-a) 2 /3!+… Running line smoother: Figure 8.10 Use the data points in the neighborhood, as defined by h or k Fit a local regression line 26

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) How to Choose k or h? When k or h is small: Single instances matter; Bias is small, variance is large Undersmoothing High complexity As k or h increases: Average over more instances Variance decreases but bias increases Oversmoothing Low complexity Cross-validation is used to finetune k or h. 28

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 29 Kernel estimate for various bin lengths for a two-class problem. Plotted are the conditional densities, p(x|C i ). It seems that the top one oversmooths and the bottom undersmooths, but whichever is the best will depend on where the validation data points are.

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

Similar presentations

Presentation on theme: "Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

Similar presentations

Presentation on theme: "Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010"— Presentation transcript:

Similar presentations

About project

Feedback