Presentation is loading. Please wait.

Presentation is loading. Please wait.

LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.

Similar presentations


Presentation on theme: "LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning."— Presentation transcript:

1 LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning

2 Announcements Reminder: A5 now due Friday (feel free to turn it in early) A6 will still come out later today Lab-style office hours start this week - Thursday nights, this room, feel free to bring dinner - Usual time: 5pm-7pm - This week: 5:30pm-7:30pm (deconflict with SDS faculty meeting)

3 Outline Model selection: alternatives to least-squares Subset selection - Best subset - Stepwise selection (forward and backward) - Estimating error using cross-validation Shrinkage methods - Ridge regression and the Lasso - Dimension reduction Recap PCA Partial Least Squares (PLS) Labs for each part

4 Flashback Question: what is the big idea in Principal Components Analysis? Answer: instead of working in the original set of dimensions, we can reorient ourselves to a space that more effectively captures the “true shape” of the data

5 Example: advertising Along which line does the data vary the most?

6 Example: advertising This one!

7 Example: advertising If we tilt our heads, we can imagine a new axis…

8 Example: advertising

9 Discussion Question 1: why is this helpful for dimension reduction? Answer: helps us eliminate redundant predictors (why?)

10 Discussion Question 2: why is this helpful for regression? Hint: what is regression really trying to do? Answer: PCA can help because we assume the directions in which the data varies most are also the directions that are most strongly associated with the response

11 Dark side* of PCR PCR is an unsupervised method The response is not actually used to determine the directions of the principal components Cons: we aren’t guaranteed that the directions that maximize variance actually tell us anything about the response Pros: this isn’t always bad: there are lots of cases where we don’t actually know the response; more in Ch. 10

12 Partial least squares (PLS) In cases where we do have a response, we probably want to make use of that information PLS is a dimension reduction method that tries to do this: - Like PCR: we first identify a new (smaller) set of features that from linear combinations of the original ones, then fit a linear model - Unlike PCR: we choose new features that not only approximate the old features well, but that are also related to the response

13 Flashback: projection process When we transformed our predictors, we said: Multiplying the data matrix by a projection matrix This is the same as: In PCR, these values were only related to the predictors

14 Discussion Question: what could we use to relate the projection to both the predictors AND the response? Answer: if we use the coefficients from simple linear regression to seed our principal component, our model will favor predictors strongly associated with the response

15 Mechanics of PLS Start by standardizing the original predictors For each of the (now standardized) predictors X j :  Compute the simple linear regression of Y onto X j  Set the φ j1 equal to the resulting coefficient β j This results in a principal component that places the highest weight on the variables that are most strongly related to the response Tip: remember the recursiveness Ben mentioned?

16 Reality check: high dimensional data Question: when the number of features p is as large as, or larger than, the number of observations n why shouldn’t we just use least squares? Answer: even if there is no real relationship between X and Y, least squares will produce coefficients that result in a perfect fit to the data (why?) Least squares is too flexible

17 Reducing flexibility Many of the approaches we’ve talked about in this chapter reduce to attempts to fit less flexible least squares 3 things to remember: 1. Regularization or shrinkage can be crucial in high-dimensional problems 2. Your predictive performance is only as good as your tuning parameters, so choose wisely! 3. Features that aren’t truly related to the response might make your training error go down, but your test error will get worse

18 Real world example where p >> n Medicine: - Let’s say we have a sample of 200 individual patients - What if instead of predicting their blood pressure using age, gender and BMI, we’re using measurements of half a million single nucleotide polymorphisms (SNPs)? Image courtesy of the Broad Institute

19 Interpreting results in high dimensions Let’s say forward selection tells us that of those half million SNPs, we find a set of 17 that lead to a good predictive model of blood pressure on the training data Question: what can we say about these 17 SNPs? Answer: this is one of many possible sets of 17 SNPs that effectively predict blood pressure. We cannot infer that these 17 SNPs are responsible for blood pressure; multicollinearity makes that impossible

20 Lab: PCR and PLS To do today’s lab in R: pls To do today’s lab in python: Instructions and code: http://www.science.smith.edu/~jcrouser/SDS293/labs/lab11/ Full version can be found beginning on p. 256 of ISLR

21 Coming up Leaving the world of linearity to try out messier methods: - Polynomial regression - Step functions - Splines - Local regression - Generalized additive models


Download ppt "LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning."

Similar presentations


Ads by Google