Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Model generalization Test error Bias, variance and complexity
Model assessment and cross-validation - overview
Data mining and statistical learning - lecture 6
MATH 685/ CSI 700/ OR 682 Lecture Notes
Selected from presentations by Jim Ramsay, McGill University, Hongliang Fei, and Brian Quanz Basis Basics.
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
1 Chapter 4 Interpolation and Approximation Lagrange Interpolation The basic interpolation problem can be posed in one of two ways: The basic interpolation.
Vector Generalized Additive Models and applications to extreme value analysis Olivier Mestre (1,2) (1) Météo-France, Ecole Nationale de la Météorologie,
1 Curve-Fitting Spline Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression.
Basis Expansion and Regularization
Kernel methods - overview
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
統計計算與模擬 政治大學統計系余清祥 2003 年 6 月 9 日 ~ 6 月 10 日 第十六週:估計密度函數
x – independent variable (input)
Curve-Fitting Interpolation
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Curve-Fitting Regression
Additive Models and Trees
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Basis Expansions and Regularization Based on Chapter 5 of Hastie, Tibshirani and Friedman.
Human Growth: From data to functions. Challenges to measuring growth We need repeated and regular access to subjects for up to 20 years. We need repeated.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Data mining and statistical learning - lecture 12 Neural networks (NN) and Multivariate Adaptive Regression Splines (MARS)  Different types of neural.
Chapter 6 Numerical Interpolation
Comp 540 Chapter 9: Additive Models, Trees, and Related Methods
Chapter 9 Additive Models,Trees,and Related Models
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Classification and Prediction: Regression Analysis
Human Growth: From data to functions. Challenges to measuring growth We need repeated and regular access to subjects for up to 20 years. We need repeated.
PATTERN RECOGNITION AND MACHINE LEARNING
Chapter 12 Multiple Regression and Model Building.
Outline Separating Hyperplanes – Separable Case
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Data Mining Volinsky - Columbia University 1 Chapter 4.2 Regression Topics Credits Hastie, Tibshirani, Friedman Chapter 3 Padhraic Smyth Lecture.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Splines and applications
Course 13 Curves and Surfaces. Course 13 Curves and Surface Surface Representation Representation Interpolation Approximation Surface Segmentation.
Basis Expansions and Regularization Part II. Outline Review of Splines Wavelet Smoothing Reproducing Kernel Hilbert Spaces.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers Local Linear Regression Local Polynomial Regression 6.2 Selecting.
Lecture 6: Point Interpolation
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.
Additive Models , Trees , and Related Models Prof. Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiaotong University.
Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.
1 Chapter 4 Interpolation and Approximation Lagrange Interpolation The basic interpolation problem can be posed in one of two ways: The basic interpolation.
LECTURE 17: BEYOND LINEARITY PT. 2 March 30, 2016 SDS 293 Machine Learning.
LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.
Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Piecewise Polynomials and Splines
Chapter 4.2 Regression Topics
ECE3340 Numerical Fitting, Interpolation and Approximation
Boosting and Additive Trees (2)
CSE 4705 Artificial Intelligence
Machine learning, pattern recognition and statistical data modelling
Machine learning, pattern recognition and statistical data modelling
Human Growth: From data to functions
Linear Model Selection and regularization
Basis Expansions and Generalized Additive Models (2)
Basis Expansions and Generalized Additive Models (1)
SKTN 2393 Numerical Methods for Nuclear Engineers
Generalized Additive Model
Presentation transcript:

Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS

Basis expansion  f(X) = E(Y |X) can often be nonlinear and non-additive in X  However, linear models are easy to fit and interpret  By augmenting the data, we may construct linear models to achieve non-linear regression/classification.

Basis expansion Some widely used transformations:  h m (X) = X m, m = 1,..., p  the original linear model.  h m (X) = X j 2, h m (X) = X j X k or higher order polynomials  augment the inputs with polynomial terms the number of variables grows exponentially in the degree of the polynomial: O(p d ) for a degree-d polynomial  h m (X) = log(X j ),...  other nonlinear transformations  h m (X) = I(L m ≤ X k < U m ), breaking the range of X k up into non-overlapping regions  piecewise constant

Basis expansion More often, we use the basis expansions as a device to achieve more flexible representations for f(X) Polynomials are global – tweaking functional forms to suite a region causes the function to flap about madly in remote regions. Red: 6 degree polynomial Blue: 7 degree polynomial

Basis expansion Piecewise-polynomials and splines allow for local polynomial representations Problem: the number of basis functions can grow too large to fit using limited data. Solution:  Restriction methods - limit the class of functions Example: additive model

Basis expansion  Selection methods Allow large numbers of basis functions, adaptively scan the dictionary and include only those basis functions h m () that contribute significantly to the fit of the model. Example: multivariate adaptive regression splines (MARS)  Regularization methods where we use the entire dictionary but restrict the coefficients. Example: Ridge regression Lasso (both regularization and selection)

Piecewise Polynomials  Assume X is one-dimensional.  Divide the domain of X into contiguous intervals, and represent f(X) by a separate polynomial in each interval.  Simplest – piecewise constant

Piecewise Polynomials  piecewise linear Three additional basis functions are needed:

Piecewise Polynomials  piecewise linear requiring continuity

Piecewise Polynomials Lower-right: Cubic spline

Spline  An order-M spline with knots ξ j, j = 1,...,K is a piecewise-polynomial of order M, and has continuous derivatives up to order M − 2. Cubic spline is order 4; piecewise-constant function an order-1 spline  Basis functions:  In practice the most widely used orders are M = 1, 2 and 4.

Natural Cubic Splines  polynomials fit to data tends to be erratic near the boundaries, and extrapolation can be dangerous.  With splines, the polynomials fit beyond the boundary knots behave even more wildly than global polynomials in that region.  A natural cubic spline adds additional constraints - the function is linear beyond the boundary knots.

Natural Cubic Splines

FIGURE 5.4. Fitted natural-spline functions for each of the terms in the final model selected by the stepwise procedure. Included are pointwise standard-error bands. South African Heart Disease data.

Smoothing Splines  Avoids the knot selection problem completely.  Uses a maximal set of knots.  The complexity of the fit is controlled by regularization.  Setup: among all functions f(x) with two continuous derivatives, find one that minimizes the penalized residual sum of squares  Lambda: smoothing parameter.  The second term penalizes curvature in the function

Smoothing Splines  The solution is a natural cubic spline with knots at the unique values of the x i, i = 1,...,N  the penalty term translates to a penalty on the spline coefficients  shrink toward the linear fit

Smoothing Splines

effective degrees of freedom of a smoothing spline:

Smoothing Splines Bias-variance trade-off

Multidimensional Splines  Basis of functions h 1k (X 1 ), k = 1,...,M 1 for X 1  Basis of functions h 2k (X 2 ), k = 1,...,M 2 for X 2  The coefficients can be fit by least squares, as before.  But the dimension of the basis grows exponentially fast.

Multidimensional Splines

Generalized Additive Models  f i () are unspecified smooth functions  If model each function using an expansion of basis functions, the model could be fit by simple least squares.  g(μ) = μ identity link, used for linear and additive models for Gaussian response data.  g(μ) = logit(μ) as above, or g(μ) = probit(μ), for modeling binomial probabilities.  g(μ) = log(μ) for log-linear or log-additive models for Poisson count data.

Generalized Additive Models  The penalized least squares: where the λj ≥0 are tuning parameters  The minimizer of (9.7) is an additive cubic spline model  Each f j is a cubic spline in the component X j, with knots at each of the unique values of x ij, i = 1,...,N.  To make solution unique,

Generalized Additive Models  Equivalent to multiple regression for linear models: S j represents the spline. > Can use other univariate regression smoothers such as local polynomial regression and kernel methods as S j

Multidimensional Splines

MARS: Multivariate Adaptive Regression Splines  an adaptive procedure for regression, well suited for high-dimensional problems  MARS uses expansions in piecewise linear basis functions of the form “a reflected pair”

MARS: Multivariate Adaptive Regression Splines  The idea is to form reflected pairs for each input X j with knots at each observed value x ij of that input.  The collection of basis functions:  If all of the input values are distinct, there are 2Np basis functions altogether.  Model: where each h m (X) is a function in C, or a product of two or more such functions.

MARS: Multivariate Adaptive Regression Splines  Model building – forward stepwise: in each iteration, select a function from the set C or their products.  coefficients β m are estimated by standard linear regression.  Add terms in the form:

MARS: Multivariate Adaptive Regression Splines In model Candidates At each stage we consider all products of a candidate pair with a basis function in the model. The product that decreases the residual error the most is added into the current model.

MARS: Multivariate Adaptive Regression Splines

 At the end of this process we have a large model that typically overfits the data.  A backward deletion procedure is applied. Remove the term whose removal causes the smallest increase in residual squared error, one at a time. This produces the best model of each size (number of terms) λ. Use (generalized) cross-validation to compare the models and select the best λ.

MARS: Multivariate Adaptive Regression Splines