Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lasso regression. The Goals of Model Selection Model selection: Choosing the approximate best model by estimating the performance of various models Goals.

Similar presentations


Presentation on theme: "Lasso regression. The Goals of Model Selection Model selection: Choosing the approximate best model by estimating the performance of various models Goals."— Presentation transcript:

1 Lasso regression

2 The Goals of Model Selection Model selection: Choosing the approximate best model by estimating the performance of various models Goals of model selection: Simple and interpretable models Accurate predictions https://www.youtube.com/watch?v=bWhJ9ixN8e0

3 Drawbacks of Full Model Estimation When you have many predictor variables, full model estimation has the following drawbacks: Assigns nonzero estimates to all regression coefficients, making it difficult to interpret Produces overfitted models that have poor predictive performance https://www.youtube.com/watch?v=bWhJ9ixN8e0

4 Variable Selection There are two main approaches towards variable selection all possible regressions approach automatic methods. http://www.stat.columbia.edu/~martin/W2024/R10.pdf

5 Lasso (Least Absolute Shrinkage and Selection Operator) Penalizing the absolute size of the regression coefficients. Some of the parameter estimates may be exactly zero. This is convenient for: automatic feature/variable selection dealing with highly correlated predictors http://stats.stackexchange.com/questions/17251/what-is-the-lasso-in-regression-analysis

6 Lasso algorithm

7 Lasso algorithm(cont.) Multiple Regression in R using OLS (2014).docx

8 Lasso vs OLS The shrinkage can result in some parameters being zeroed the lasso estimates will be the same as the OLS estimates when constraint is very large Multiple Regression in R using OLS (2014).docx

9 Cp assess the fit of a regression model that has been estimated using OLS The C p statistic is often used as a stopping rule for various forms of stepwise regression http://en.wikipedia.org/wiki/Mallows%27s_Cp

10 Advantage of Lasso good computational properties both parameter estimation and variable selection allowing an adaptive amount of shrinkage for each regression coefficient http://hansheng.gsm.pku.edu.cn/pdf/2007/lsa.pdf

11 How lasso works (Using R) We generate some data randomly to see how lasso does the variable selection

12 Simulation – Bias-variance Tradeoff We use three different models courtesy of Dr. Peter Westfall

13 Lasso in practice(data mining) We use diabetes data to build lasso model: Data partition Build model based on training dataset Validate model by using validate dataset

14 Data Description The diabetes data frame has 442 rows and 3 columns: X-a matrix with 10 columns: age, sex, body mass index, average blood pressure, and six blood serum measurements Y-a numeric vector: a quantitative measure of disease progression one year after baseline http://cran.r-project.org/web/packages/lars/lars.pdf

15 Main Sample Pat x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 y 1 59 2 32.1 101 157 93.2 38 4 4.9 87 151 2 48 1 21.6 87 183 103.2 70 3 3.9 69 75 3 72 2 30.5 93 156 93.6 41 4 4.7 85 141... 441 36 1 30.0 95 201 125.2 42 5 5.1 85 220 442 36 1 19.6 71 250 133.2 97 3 4.6 92 57 LeastAngle_2002.pdf


Download ppt "Lasso regression. The Goals of Model Selection Model selection: Choosing the approximate best model by estimating the performance of various models Goals."

Similar presentations


Ads by Google