1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.

Slides:



Advertisements
Similar presentations
Flexible smoothing with B-splines and Penalties or P-splines P-splines = B-splines + Penalization Applications : Generalized Linear and non linear Modelling.
Advertisements

Data Mining Classification: Alternative Techniques
Pattern Recognition and Machine Learning: Kernel Methods.
Introduction to Smoothing Splines
Lecture 3 Nonparametric density estimation and classification
Model assessment and cross-validation - overview
CS 445/645 Fall 2001 Hermite and Bézier Splines. Specifying Curves Control Points –A set of points that influence the curve’s shape Knots –Control points.
1 Chapter 4 Interpolation and Approximation Lagrange Interpolation The basic interpolation problem can be posed in one of two ways: The basic interpolation.
Basis Expansion and Regularization
Data mining in 1D: curve fitting
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
The Simple Linear Regression Model: Specification and Estimation
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Kernel methods - overview
統計計算與模擬 政治大學統計系余清祥 2003 年 6 月 9 日 ~ 6 月 10 日 第十六週:估計密度函數
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data.
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Curve-Fitting Regression
Nonparametric Regression and Clustered/Longitudinal Data
Optimal Bandwidth Selection for MLS Surfaces
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.
Applications of Nonparametric Survey Regression Estimation in Aquatic Resources F. Jay Breidt, Siobhan Everson-Stewart, Alicia Johnson, Jean D. Opsomer.
Correlation and Regression Analysis
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Simple Linear Regression Models
3/2003 Rev 1 I – slide 1 of 33 Session I Part I Review of Fundamentals Module 2Basic Physics and Mathematics Used in Radiation Protection.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Model Comparison for Tree Resin Dose Effect On Termites Lianfen Qian Florida Atlantic University Co-author: Soyoung Ryu, University of Washington.
Curve-Fitting Regression
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
CHAPTER 3 Model Fitting. Introduction Possible tasks when analyzing a collection of data points: Fitting a selected model type or types to the data Choosing.
Time series Decomposition Farideh Dehkordi-Vakil.
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers Local Linear Regression Local Polynomial Regression 6.2 Selecting.
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Correlation & Regression Analysis
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
CHAPTER 2.3 PROBABILITY DISTRIBUTIONS. 2.3 GAUSSIAN OR NORMAL ERROR DISTRIBUTION  The Gaussian distribution is an approximation to the binomial distribution.
Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.
1 Chapter 4 Interpolation and Approximation Lagrange Interpolation The basic interpolation problem can be posed in one of two ways: The basic interpolation.
CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.
Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
The simple linear regression model and parameter estimation
Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Ch3: Model Building through Regression
Ch8: Nonparametric Methods
Machine learning, pattern recognition and statistical data modelling
Overview of Supervised Learning
Statistical Learning Dong Liu Dept. EEIS, USTC.
CHAPTER 29: Multiple Regression*
Regression Models - Introduction
Product moment correlation
Model generalization Brief summary of methods
Econometrics I Professor William Greene Stern School of Business
SKTN 2393 Numerical Methods for Nuclear Engineers
政治大學統計系余清祥 2004年5月26日~ 6月7日 第十六、十七週:估計密度函數
Presentation transcript:

1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277

2 Reference Applied Nonparametric Regression, Wolfgang Hardle, Cambridge Chapter 1 – 3.

3 Outline Introduction Motivation Basic Idea of Smoothing Smoothing techniques Kernel smoothing k-nearest neighbor estimates spline smoothing A comparison of kernel, k-NN and spline smoothers

4 Introduction The aim of a regression analysis is to produce a reasonable analysis to the unknown response function m, where for n data points ( ), the relationship can be modeled as Unlike parametric approach where the function m is fully described by a finite set of parameters, nonparametric modeling accommodate a very flexible form of the regression curve.

5 Motivation It provides a versatile method of exploring a general relationship between variables It gives predictions of observations yet to be made without reference to a fixed parametric model It provides a tool for finding spurious observations by studying the influence of isolated points It constitutes a flexible method of substituting for missing values or interpolating between adjacent X-values

6 Basic Idea of Smoothing A reasonable approximation to the regression curve m(x) will be the mean of response variables near a point x. This local averaging procedure can be defined as Every smoothing method to be described is of the form (2). The amount of averaging is controlled by a smoothing parameter. The choice of smoothing parameter is related to the balances between bias and variance.

7 Figure 1. Expenditure of potatoes as a function of net income. h = 0.1, 1.0, n = 7125, year = 1973.

8 Smoothing Techniques Kernel Smoothing Kernel smoothing describes the shape of the weight function by a density function K with a scale parameter that adjusts the size and the form of the weights near x. The kernel K is a continuous, bounded and symmetric real function which integrates to 1. The weight is defined by where, and.

9 Kernel Smoothing The Nadaraya-Watson estimator is defined by The mean squared error is. As we have, under certain conditions, Where The bias is increasing whereas the variance is decreasing in h.

10 Figure 2. The Epanechnikov kernel K (u) = 0.75(1-u 2 ) I (|u| <= 1 ).

11 Figure 3. The effective kernel weights for the food versus net income data set. at x = 1 and x = 2.5 for h = 0.1 ( label 1 ), h = 0.2 ( label 2 ), h = 0.3 ( label 3 ) with Epanechnikov kernel.

12

13 K-Nearest Neighbor Estimates In k-NN, the neighborhood is defined through those X – variables which are among the k-nearest neighbors of x in Euclidean distance. The k-NN smoother is defined as where { } i=1, …, n is defined through the set of Indexes, and

14 K-nearest Neighbor Estimates The smoothing parameter k regulates the degree of smoothness of the estimated curve. It plays a role similar to the bandwidth for kernel smoothers. The influence of varying k on qualitative features of the estimated curve is similar to that observed for kernel estimation with a uniform kernel. When k > n, the k - NN smoother then is equal to the average of the response variables. When k = 1, the observations are reproduced at X i, and for an x between two adjacent predictor variables a step function is obtained with a jump in the middle between the two observations.

15 K-nearest Neighbor Estimates Let. Bias and variance of the k- NN estimate with weights as in (7) are given by Note: The trade-off between bias 2 and variance is thus achieved in an asymptotic sense by setting k ~ n 4/5

16 K-nearest Neighbor Estimates In addition to the “uniform” weights, the k-NN weights can be generally thought of as being generated by a kernel function K, where and R is the distance between x and its k-th nearest neighbor.

17 Figure 4. The effective k-NN weights for the food versus net income data set. at x = 1 and x = 2.5 for k = 100 ( label 1 ), k = 200 ( label 2 ), k = 300 ( label 3 ) with Epanechnikov kernel.

18

19 K-nearest Neighbor Estimates Let, and c K, d K be defined as previously, then Note: The trade-off between bias 2 and variance is thus achieved in an asymptotic sense by setting k ~ n 4/5, like the uniform k-NN weights.

20 Spline Smoothing Spline smoothing quantifies the competition between the aim to produce a good fit to the data the aim to produce a curve without too much rapid local variation. The regression curve is obtained by minimizing the penalized sum of squares where m is twice-differentiable function on [a,b], and λ represents the rate of exchange between residual error and roughness of the curve m.

21 Figure 5. A spline smooth of the Motorcycle data set.

22 Spline Smoothing The spline is linear in the Y observations, and there exists weights that Silverman in 1984 showed for large n, small λ, and X i not too close to the boundary, where the local bandwith h(X i ) satisfies

23 Figure 6. The asymptotic spline kernel function

24 Spline Smoothing A variation to (11) is to solve the equivalent problem under the constraint. The parameters λ and Δ have similar meanings, and are connected by the relationship where and solves (12).

25 A comparison of kernel, k-NN and spline smoothers Table 1. Bias and variance of kernel and k-NN smoother kernelk-NN bias variance

26 Figure 7. A simulated data set. The raw data n=100 were constructed from and

27 Figure 8. A kernel smooth of the simulated data set. The black line (label 1) denotes the underlying regression curve The green line (label 2) is the Gaussian kernel smooth.

28 Figure 9. A k-NN kernel smooth of the simulated data set. The black line (label 1) denotes the underlying regression curve. The green line (label 2) is the k-NN smoother

29 Figure 10. A spline smooth of the simulated data set. The black line (label 1) denotes the underlying regression curve. The green line (label 2) is the spline smoother

30 Figure 11. Residual plot of k-NN, kernel and spline smoother for the simulated data set.