LECTURE 17: BEYOND LINEARITY PT. 2 March 30, 2016 SDS 293 Machine Learning.

Slides:



Advertisements
Similar presentations
Lecture Notes #11 Curves and Surfaces II
Advertisements

Chapter 15 Above: GPS time series from southern California after removing several curve fits to the data.
Data mining and statistical learning - lecture 6
Selected from presentations by Jim Ramsay, McGill University, Hongliang Fei, and Brian Quanz Basis Basics.
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
1 Chapter 4 Interpolation and Approximation Lagrange Interpolation The basic interpolation problem can be posed in one of two ways: The basic interpolation.
1 Curve-Fitting Spline Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression.
CITS2401 Computer Analysis & Visualisation
Computational Methods in Physics PHYS 3437
EARS1160 – Numerical Methods notes by G. Houseman
1Notes  Assignment 0 is due today!  To get better feel for splines, play with formulas in MATLAB!
1cs542g-term Notes  Added required reading to web (numerical disasters)
Jim Ramsay McGill University Basis Basics. Overview  What are basis functions?  What properties should they have?  How are they usually constructed?
Kernel methods - overview
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Basis Expansions and Regularization Based on Chapter 5 of Hastie, Tibshirani and Friedman.
Modelling: Curves Week 11, Wed Mar 23
University of British Columbia CPSC 414 Computer Graphics © Tamara Munzner 1 Curves Week 13, Mon 24 Nov 2003.
Spreadsheet Problem Solving
Classification and Prediction: Regression Analysis
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Calibration & Curve Fitting
Objectives of Multiple Regression
V. Space Curves Types of curves Explicit Implicit Parametric.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Splines Vida Movahedi January 2007.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Lecture 6: Point Interpolation
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
LECTURE 06: CLASSIFICATION PT. 2 February 10, 2016 SDS 293 Machine Learning.
LECTURE 03: LINEAR REGRESSION PT. 1 February 1, 2016 SDS 293 Machine Learning.
Rendering Bezier Curves (1) Evaluate the curve at a fixed set of parameter values and join the points with straight lines Advantage: Very simple Disadvantages:
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Chapter 17 and 18 Interpolation.
1 Chapter 4 Interpolation and Approximation Lagrange Interpolation The basic interpolation problem can be posed in one of two ways: The basic interpolation.
Curves University of British Columbia CPSC 314 Computer Graphics Jan-Apr 2013 Tamara Munzner.
LECTURE 16: BEYOND LINEARITY PT. 1 March 28, 2016 SDS 293 Machine Learning.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
LECTURE 20: SUPPORT VECTOR MACHINES PT. 1 April 11, 2016 SDS 293 Machine Learning.
LECTURE 11: LINEAR MODEL SELECTION PT. 1 March SDS 293 Machine Learning.
LECTURE 12: LINEAR MODEL SELECTION PT. 2 March 7, 2016 SDS 293 Machine Learning.
LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.
Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the.
Interpolation - Introduction
LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.
PREDICT 422: Practical Machine Learning
Piecewise Polynomials and Splines
ECE3340 Numerical Fitting, Interpolation and Approximation
Curve-Fitting Spline Interpolation
Linear Regression CSC 600: Data Mining Class 13.
Machine learning, pattern recognition and statistical data modelling
Roberto Battiti, Mauro Brunato
© University of Wisconsin, CS559 Spring 2004
Spline Interpolation Class XVII.
10701 / Machine Learning Today: - Cross validation,
Introduction to Predictive Modeling
Linear Model Selection and regularization
MATH 174: Numerical Analysis I
Basis Expansions and Generalized Additive Models (2)
Basis Expansions and Generalized Additive Models (1)
Splines There are cases where polynomial interpolation is bad
Machine learning overview
Introduction to Parametric Curve and Surface Modeling
SKTN 2393 Numerical Methods for Nuclear Engineers
CS5760: Computer Vision Lecture 9: RANSAC Noah Snavely
Generalized Additive Model
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Presentation transcript:

LECTURE 17: BEYOND LINEARITY PT. 2 March 30, 2016 SDS 293 Machine Learning

Announcements THIS WEEKEND Two Smith teams still have space for additional people; if you want to join a team let me (or the team) know!

Outline Moving beyond linearity Polynomial regression Step functions - Splines - Local regression - Generalized additive models (GAMs) Lab

Recap: polynomial regression Big idea: extend the linear model by adding extra predictors that are powers of X

Recap: step functions Big idea: break X into pieces, fit a separate model on each piece, and glue them together

Discussion Question: what do these approaches have in common? Answer: they both apply some set of transformations to the predictors. These transformations are known more generally as basis functions: - Polynomial regression: - Step functions: Lots of other functions we could try as well!

may take different values in different parts of X Piecewise polynomials What if we combine polynomials and step functions? Ex: Points where coefficients change = “knots”

Discontinuous at knots Problem? Ex: Wage data subset

Still looks unnatural One way to fix it: require continuity

Degrees of freedom vs. constraints In our piecewise cubic function with one knot, we had 8 degrees of freedom: We can add constraints to remove degrees of freedom: 1. Function must be continuous 2. Function must have continuous 1 st derivative (slope) 3. Function must have continuous 2 nd derivative (curvature)

Nice and smooth Better way: constrain function & derivatives “Spline”

Question: how do we we fit a piecewise degree- d polynomial while requiring that it (and possibly its first d−1 derivatives) be continuous? Answer: use the basis model we talked about previously Regression splines

just need to choose appropriate basis functions Let’s say we want a cubic spline * with K knots: One common approach is to start with a standard basis for a cubic polynomial ( x, x 2, x 3 ) and then add one truncated power basis function per knot ξ : Fitting regression splines *Cubic splines are popular because most human eyes cannot detect the discontinuity at the knots

Ex: Wage data, 3 knots

High variance in the outer range Ex: Wage data, 3 knots, cubic spline

“Boundary constraint” = function must be linear in the outer range Ex: Wage data, 3 knots, natural spline

Question: how do we figure out how many knots to use, and where to put them? Answer: the methods we used to determine the best number of predictors can help us figure out how many knots to use. For placement, we have several options: - Place them uniformly across the domain - Put more knots in places where the data varies a lot - Place them at percentiles of interest (e.g. 25 th, 50 th, and 75 th ) Regression splines

Ex: Wage data, 3 knots at 25 th, 50 th, & 75 th

Comparison with polynomial regression Question: how would you expect this to compare to polynomial regression? Answer: regression splines often give better results than polynomial regression because they can add flexibility in places where it is needed by adding more knots, without having to add more predictors

Ex: Wage data, polynomial vs. spline

Discussion Regression splines: specify knots, find good basis functions, and use least squares to estimate coefficients Goal: find a function g(x) that fits the data well, i.e. is small Question: what’s a trivial way to minimize RSS? Answer: interpolate over all the data (overfit to the max!)

“make sure you fit the data” “make sure you’re smooth” Smoothing splines Goal: find a function g that makes RSS small but that is also smooth Dust off your calculus * and we can find g that minimizes: Fun fact: this is minimized by a shrunken version of the natural cubic spline with knots at each training obs. *The second derivative of a function is a measure of its roughness: it is large in absolute value if g(t) is very wiggly near t, and close to zero otherwise

Whoa… knots at every training point? Question: shouldn’t this give us way too much flexibility? Answer: the key is in the shrinkage parameter λ, which influences our effective degrees of freedom

Ex: Wage data, smoothing splines w/ different λ

Recap So far: flexibly predict Y on the basis of one predictor X = extensions of simple linear regression Question: what seems like the next logical step?

Generalized additive models (GAMs) Big idea: extend these methods to multiple predictors and non-linear functions of those predictors just like we did in with linear models before Multiple linear regression: GAM: polynomials, step functions, splines…

Ex: Wage data, GAM with splines

Pros and Cons of GAMs Good stuff Non-linear functions are potentially more accurate Adds local flexibility w/o incurring a global penalty Because model is still additive, can still see the effect of each X j on Y Bad stuff Just like in multiple regression, have to add interaction terms manually * *In the next chapter, we’ll talk about fully general models that can (finally!) deal with this

Lab: Splines and GAMs To do today’s lab in R: spline, gam To do (the first half of) today’s lab in python: patsy Instructions and code: Full version can be found beginning on p. 293 of ISLR