LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.

Slides:



Advertisements
Similar presentations
CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
Advertisements

Geometric Representation of Regression. ‘Multipurpose’ Dataset from class website Attitude towards job –Higher scores indicate more unfavorable attitude.
Chapter Outline 3.1 Introduction
Structural Equation Modeling
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Statistics for the Social Sciences
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Chapter 11 Multiple Regression.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Objectives of Multiple Regression
Summarized by Soo-Jin Kim
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
PATTERN RECOGNITION AND MACHINE LEARNING
1 Psych 5510/6510 Chapter 10. Interactions and Polynomial Regression: Models with Products of Continuous Predictors Spring, 2009.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
CpSc 881: Machine Learning
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Logistic Regression & Elastic Net
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.
LECTURE 03: LINEAR REGRESSION PT. 1 February 1, 2016 SDS 293 Machine Learning.
LECTURE 04: LINEAR REGRESSION PT. 2 February 3, 2016 SDS 293 Machine Learning.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
LECTURE 16: BEYOND LINEARITY PT. 1 March 28, 2016 SDS 293 Machine Learning.
LECTURE 17: BEYOND LINEARITY PT. 2 March 30, 2016 SDS 293 Machine Learning.
LECTURE 11: LINEAR MODEL SELECTION PT. 1 March SDS 293 Machine Learning.
LECTURE 12: LINEAR MODEL SELECTION PT. 2 March 7, 2016 SDS 293 Machine Learning.
LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.
LECTURE 14: DIMENSIONALITY REDUCTION: PRINCIPAL COMPONENT REGRESSION March 21, 2016 SDS 293 Machine Learning B.A. Miller.
PREDICT 422: Practical Machine Learning Module 4: Linear Model Selection and Regularization Lecturer: Nathan Bastian, Section: XXX.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
CHAPTER 12 More About Regression
Chapter 15 Multiple Regression Model Building
Statistical Data Analysis - Lecture /04/03
Background on Classification
Chapter 9 Multiple Linear Regression
Boosting and Additive Trees (2)
Generalized regression techniques
Regression.
CHAPTER 12 More About Regression
Machine Learning Dimensionality Reduction
Roberto Battiti, Mauro Brunato
CHAPTER 29: Multiple Regression*
Principal Component Analysis
What is Regression Analysis?
Linear Model Selection and Regularization
Stat 324 – Day 28 Model Validation (Ch. 11).
Linear Model Selection and regularization
The Bias Variance Tradeoff and Regularization
CHAPTER 12 More About Regression
Basis Expansions and Generalized Additive Models (1)
Parametric Methods Berlin Chen, 2005 References:
Principal Components Analysis
CHAPTER 12 More About Regression
CS5760: Computer Vision Lecture 9: RANSAC Noah Snavely
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning

Announcements Reminder: A5 now due Friday (feel free to turn it in early) A6 will still come out later today Lab-style office hours start this week - Thursday nights, this room, feel free to bring dinner - Usual time: 5pm-7pm - This week: 5:30pm-7:30pm (deconflict with SDS faculty meeting)

Outline Model selection: alternatives to least-squares Subset selection - Best subset - Stepwise selection (forward and backward) - Estimating error using cross-validation Shrinkage methods - Ridge regression and the Lasso - Dimension reduction Recap PCA Partial Least Squares (PLS) Labs for each part

Flashback Question: what is the big idea in Principal Components Analysis? Answer: instead of working in the original set of dimensions, we can reorient ourselves to a space that more effectively captures the “true shape” of the data

Example: advertising Along which line does the data vary the most?

Example: advertising This one!

Example: advertising If we tilt our heads, we can imagine a new axis…

Example: advertising

Discussion Question 1: why is this helpful for dimension reduction? Answer: helps us eliminate redundant predictors (why?)

Discussion Question 2: why is this helpful for regression? Hint: what is regression really trying to do? Answer: PCA can help because we assume the directions in which the data varies most are also the directions that are most strongly associated with the response

Dark side* of PCR PCR is an unsupervised method The response is not actually used to determine the directions of the principal components Cons: we aren’t guaranteed that the directions that maximize variance actually tell us anything about the response Pros: this isn’t always bad: there are lots of cases where we don’t actually know the response; more in Ch. 10

Partial least squares (PLS) In cases where we do have a response, we probably want to make use of that information PLS is a dimension reduction method that tries to do this: - Like PCR: we first identify a new (smaller) set of features that from linear combinations of the original ones, then fit a linear model - Unlike PCR: we choose new features that not only approximate the old features well, but that are also related to the response

Flashback: projection process When we transformed our predictors, we said: Multiplying the data matrix by a projection matrix This is the same as: In PCR, these values were only related to the predictors

Discussion Question: what could we use to relate the projection to both the predictors AND the response? Answer: if we use the coefficients from simple linear regression to seed our principal component, our model will favor predictors strongly associated with the response

Mechanics of PLS Start by standardizing the original predictors For each of the (now standardized) predictors X j :  Compute the simple linear regression of Y onto X j  Set the φ j1 equal to the resulting coefficient β j This results in a principal component that places the highest weight on the variables that are most strongly related to the response Tip: remember the recursiveness Ben mentioned?

Reality check: high dimensional data Question: when the number of features p is as large as, or larger than, the number of observations n why shouldn’t we just use least squares? Answer: even if there is no real relationship between X and Y, least squares will produce coefficients that result in a perfect fit to the data (why?) Least squares is too flexible

Reducing flexibility Many of the approaches we’ve talked about in this chapter reduce to attempts to fit less flexible least squares 3 things to remember: 1. Regularization or shrinkage can be crucial in high-dimensional problems 2. Your predictive performance is only as good as your tuning parameters, so choose wisely! 3. Features that aren’t truly related to the response might make your training error go down, but your test error will get worse

Real world example where p >> n Medicine: - Let’s say we have a sample of 200 individual patients - What if instead of predicting their blood pressure using age, gender and BMI, we’re using measurements of half a million single nucleotide polymorphisms (SNPs)? Image courtesy of the Broad Institute

Interpreting results in high dimensions Let’s say forward selection tells us that of those half million SNPs, we find a set of 17 that lead to a good predictive model of blood pressure on the training data Question: what can we say about these 17 SNPs? Answer: this is one of many possible sets of 17 SNPs that effectively predict blood pressure. We cannot infer that these 17 SNPs are responsible for blood pressure; multicollinearity makes that impossible

Lab: PCR and PLS To do today’s lab in R: pls To do today’s lab in python: Instructions and code: Full version can be found beginning on p. 256 of ISLR

Coming up Leaving the world of linearity to try out messier methods: - Polynomial regression - Step functions - Splines - Local regression - Generalized additive models