Model Selection and Estimation in Regression with Grouped Variables.

Slides:



Advertisements
Similar presentations
Ordinary Least-Squares
Advertisements

Chapter 5 Multiple Linear Regression
Multiple Regression Analysis
Topic 12: Multiple Linear Regression
The Simple Regression Model
Inference for Regression
12-1 Multiple Linear Regression Models Introduction Many applications of regression analysis involve situations in which there are more than.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Model generalization Test error Bias, variance and complexity
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Chapter 2: Lasso for linear models
Objectives (BPS chapter 24)
The General Linear Model. The Simple Linear Model Linear Regression.
(c) 2007 IUPUI SPEA K300 (4392) Outline Least Squares Methods Estimation: Least Squares Interpretation of estimators Properties of OLS estimators Variance.
The loss function, the normal equation,
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
FIN357 Li1 The Simple Regression Model y =  0 +  1 x + u.
The Simple Regression Model
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression.
Lasso regression. The Goals of Model Selection Model selection: Choosing the approximate best model by estimating the performance of various models Goals.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
FIN357 Li1 The Simple Regression Model y =  0 +  1 x + u.
Classification and Prediction: Regression Analysis
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Correlation & Regression
Objectives of Multiple Regression
Chapter 13: Inference in Regression
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
STA302/ week 911 Multiple Regression A multiple regression model is a model that has more than one explanatory variable in it. Some of the reasons.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Managerial Economics Demand Estimation & Forecasting.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Chapter 13 Multiple Regression
Math 4030 – 11b Method of Least Squares. Model: Dependent (response) Variable Independent (control) Variable Random Error Objectives: Find (estimated)
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
Regression Analysis1. 2 INTRODUCTION TO EMPIRICAL MODELS LEAST SQUARES ESTIMATION OF THE PARAMETERS PROPERTIES OF THE LEAST SQUARES ESTIMATORS AND ESTIMATION.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION EKF and Observability ASEN 5070 LECTURE 23 10/21/09.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors.
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Multiple Regression.
The simple linear regression model and parameter estimation
BINARY LOGISTIC REGRESSION
Chapter 7. Classification and Prediction
Simple Linear Regression
Analysis of Definitive Screening Designs
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Section 11.1: Least squares estimation CIS Computational.
Non-linear relationships
CSE 4705 Artificial Intelligence
Statistics in MSmcDESPOT
CJT 765: Structural Equation Modeling
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Section 11.1: Least squares estimation CIS Computational.
Linear Regression.
Multiple Regression.
Prepared by Lee Revere and John Large
Modelling data and curve fitting
M248: Analyzing data Block D UNIT D2 Regression.
OVERVIEW OF LINEAR MODELS
Model Selection and Estimation in Regression with Grouped Variables
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Model Selection and Estimation in Regression with Grouped Variables

Remember….. Consider fitting this simple model:Consider fitting this simple model: with arbitrary explanatory variables X 1, X 2, X 3 and continuous Y. with arbitrary explanatory variables X 1, X 2, X 3 and continuous Y. If we want to determine whether X 1, X 2, X 3 are predictive of Y, we need to take into account the groups of variables derived from X 1, X 2, X 3.If we want to determine whether X 1, X 2, X 3 are predictive of Y, we need to take into account the groups of variables derived from X 1, X 2, X 3. 2 nd Example: ANOVA (dummy variables of a factor form the groups)2 nd Example: ANOVA (dummy variables of a factor form the groups)

Group LARS proceeds in two steps:Group LARS proceeds in two steps: 1)A solution path that is indexed by a tuning parameter λ is built. (Solution path is just a “path” of how the estimated coefficients move in space as a function of λ) 2) The final model is selected on the solution path by some “minimal risk” criterion. Remember…..

Notation Model form:Model form: Assume we have J factors/groups of variablesAssume we have J factors/groups of variables Y is (n x 1)Y is (n x 1) ε ~MVN(0, σ 2 )ε ~MVN(0, σ 2 ) p j is the number of variables in group jp j is the number of variables in group j X j is (n x p j ) design matrix for group jX j is (n x p j ) design matrix for group j β j is the coefficient vector for group jβ j is the coefficient vector for group j Each X j is centered/ortho-normalized and Y is centered.Each X j is centered/ortho-normalized and Y is centered.

Remember….. Group LARS Solution Path Algorithm (Refresher): 1.Compute the current ‘most correlated set’ (A) by adding in the factor that maximizes the “correlation” between the current residual and the factor (accounting for factor size). 2.Move the coefficient vector (β) in the direction of the projection of our current residual onto the factors in (A). 3.Continue down this path until a new factor (outside (A)) has the same correlation as factors in (A). Add that new factor into (A). 4.Repeat steps 2-3 until we have no more factors that can be added to (A). (Note: solution path is piecewise linear, so computationally efficient!)(Note: solution path is piecewise linear, so computationally efficient!)

Cp Criterion (How to Select a Final Model) In gaussian regression problems, an unbiased estimate of “true risk” is whereIn gaussian regression problems, an unbiased estimate of “true risk” is where. When the full design matrix X is orthonormal, it can be shown that an unbiased estimate for “df” is:When the full design matrix X is orthonormal, it can be shown that an unbiased estimate for “df” is: Note the orthonormal Group LARS solution is:Note the orthonormal Group LARS solution is:

Degree-of-Freedom Calculation (Intuition) When the full design matrix X is orthonormal, it can be shown that an unbiased estimate for “df” is:When the full design matrix X is orthonormal, it can be shown that an unbiased estimate for “df” is: Note the orthonormal Group LARS solution is:Note the orthonormal Group LARS solution is: The general formula for “df” is:The general formula for “df” is:

Real Dataset Example Famous Birthweight dataset from Hosmer/Lemeshow.Famous Birthweight dataset from Hosmer/Lemeshow. Y = Baby birthweight, 2 continuous predictors (Age/weight of mother), 6 categorical predictors.Y = Baby birthweight, 2 continuous predictors (Age/weight of mother), 6 categorical predictors. For continuous predictors, use 3 rd -order polynomials for “factors”.For continuous predictors, use 3 rd -order polynomials for “factors”. For categorical predictors, use “dummy variables” excluding the final group.For categorical predictors, use “dummy variables” excluding the final group. 75%/25% train/test split.75%/25% train/test split. Methods Compared: Group LARS, Backward Stepwise (LARS isn’t possible)Methods Compared: Group LARS, Backward Stepwise (LARS isn’t possible)

Real Dataset Example Minimal Cp

Real Dataset Example Factors Selected:Factors Selected: Group LARS: All factors except Number of Physician Visits during the First Trimester Backward Stepwise: All factors except Number of Physician Visits during the First Trimester & Mother’s Weight

Real Dataset Example Test Set Prediction MSE Group LARS Backward Stepwise Overall Test Set MSE533035

Simulation Example #1 17 random variables Z 1, Z 2,…, Z 16, W were independently drawn from a Normal(0,1).17 random variables Z 1, Z 2,…, Z 16, W were independently drawn from a Normal(0,1). X i = (Z i + W) / SQRT(2)X i = (Z i + W) / SQRT(2) Y = X X X 3 + (1/3)*X X (2/3)*X 6 + εY = X X X 3 + (1/3)*X X (2/3)*X 6 + ε ε ~ N(0, 2 2 )ε ~ N(0, 2 2 ) Each simulation has 100 observations, 200 simulations.Each simulation has 100 observations, 200 simulations. Methods Compared: Group LARS, LARS, Least Squares, Backward StepwiseMethods Compared: Group LARS, LARS, Least Squares, Backward Stepwise All 3 rd -order main effects are considered.All 3 rd -order main effects are considered.

Simulation Example #1 Group LARS LARSOLSStep wise Mean Test Set Prediction MSE Mean # of Factors Present

Simulation Example #2 20 random variables X 1, X 2,…, X 20 were generated as in Example #1.20 random variables X 1, X 2,…, X 20 were generated as in Example #1. X 11, X 12,…, X 20 are trichotomized as 0, 1, or 2 if they are smaller than the 33 rd percentile of a Normal(0,1), larger than the 66 th percentile, or in between.X 11, X 12,…, X 20 are trichotomized as 0, 1, or 2 if they are smaller than the 33 rd percentile of a Normal(0,1), larger than the 66 th percentile, or in between. Y = X X X 3 + (1/3)*X X (2/3)*X 6 +Y = X X X 3 + (1/3)*X X (2/3)*X * I(X 11 = 0) + I(X 11 = 1) + ε ε ~ N(0, 2 2 )ε ~ N(0, 2 2 ) Each simulation has 100 observations, 200 simulations.Each simulation has 100 observations, 200 simulations. Methods Compared: Group LARS, LARS, Least Squares, Backward StepwiseMethods Compared: Group LARS, LARS, Least Squares, Backward Stepwise All 3 rd -order main effects/categorical factors are considered.All 3 rd -order main effects/categorical factors are considered.

Simulation Example #2 Group LARS LARSOLSStep wise Mean Test Set Prediction MSE Mean # of Factors Present

Conclusion Group LARS provides an improvement over the traditional backward stepwise selection + OLS, but still over-selects factors.Group LARS provides an improvement over the traditional backward stepwise selection + OLS, but still over-selects factors. In the simulations, stepwise selection tends to under-select factors relative to Group LARS, and performs more poorly.In the simulations, stepwise selection tends to under-select factors relative to Group LARS, and performs more poorly. Simulation #1 suggests LARS over-selects factors because it enters individual variables into the model (and not the full factor).Simulation #1 suggests LARS over-selects factors because it enters individual variables into the model (and not the full factor). Group LARS is also computationally efficient due to its piecewise linear solution path algorithm.Group LARS is also computationally efficient due to its piecewise linear solution path algorithm. is the formula for the “correlation” between a factor j and the current residual r. May select factors if a couple derived inputs are predictive and the rest being redundant. is the formula for the “correlation” between a factor j and the current residual r. May select factors if a couple derived inputs are predictive and the rest being redundant.

EL FIN