Presentation is loading. Please wait.

Presentation is loading. Please wait.

Model Selection and Estimation in Regression with Grouped Variables.

Similar presentations


Presentation on theme: "Model Selection and Estimation in Regression with Grouped Variables."— Presentation transcript:

1 Model Selection and Estimation in Regression with Grouped Variables

2 Remember….. Consider fitting this simple model:Consider fitting this simple model: with arbitrary explanatory variables X 1, X 2, X 3 and continuous Y. with arbitrary explanatory variables X 1, X 2, X 3 and continuous Y. If we want to determine whether X 1, X 2, X 3 are predictive of Y, we need to take into account the groups of variables derived from X 1, X 2, X 3.If we want to determine whether X 1, X 2, X 3 are predictive of Y, we need to take into account the groups of variables derived from X 1, X 2, X 3. 2 nd Example: ANOVA (dummy variables of a factor form the groups)2 nd Example: ANOVA (dummy variables of a factor form the groups)

3 Group LARS proceeds in two steps:Group LARS proceeds in two steps: 1)A solution path that is indexed by a tuning parameter λ is built. (Solution path is just a “path” of how the estimated coefficients move in space as a function of λ) 2) The final model is selected on the solution path by some “minimal risk” criterion. Remember…..

4 Notation Model form:Model form: Assume we have J factors/groups of variablesAssume we have J factors/groups of variables Y is (n x 1)Y is (n x 1) ε ~MVN(0, σ 2 )ε ~MVN(0, σ 2 ) p j is the number of variables in group jp j is the number of variables in group j X j is (n x p j ) design matrix for group jX j is (n x p j ) design matrix for group j β j is the coefficient vector for group jβ j is the coefficient vector for group j Each X j is centered/ortho-normalized and Y is centered.Each X j is centered/ortho-normalized and Y is centered.

5 Remember….. Group LARS Solution Path Algorithm (Refresher): 1.Compute the current ‘most correlated set’ (A) by adding in the factor that maximizes the “correlation” between the current residual and the factor (accounting for factor size). 2.Move the coefficient vector (β) in the direction of the projection of our current residual onto the factors in (A). 3.Continue down this path until a new factor (outside (A)) has the same correlation as factors in (A). Add that new factor into (A). 4.Repeat steps 2-3 until we have no more factors that can be added to (A). (Note: solution path is piecewise linear, so computationally efficient!)(Note: solution path is piecewise linear, so computationally efficient!)

6 Cp Criterion (How to Select a Final Model) In gaussian regression problems, an unbiased estimate of “true risk” is whereIn gaussian regression problems, an unbiased estimate of “true risk” is where. When the full design matrix X is orthonormal, it can be shown that an unbiased estimate for “df” is:When the full design matrix X is orthonormal, it can be shown that an unbiased estimate for “df” is: Note the orthonormal Group LARS solution is:Note the orthonormal Group LARS solution is:

7 Degree-of-Freedom Calculation (Intuition) When the full design matrix X is orthonormal, it can be shown that an unbiased estimate for “df” is:When the full design matrix X is orthonormal, it can be shown that an unbiased estimate for “df” is: Note the orthonormal Group LARS solution is:Note the orthonormal Group LARS solution is: The general formula for “df” is:The general formula for “df” is:

8 Real Dataset Example Famous Birthweight dataset from Hosmer/Lemeshow.Famous Birthweight dataset from Hosmer/Lemeshow. Y = Baby birthweight, 2 continuous predictors (Age/weight of mother), 6 categorical predictors.Y = Baby birthweight, 2 continuous predictors (Age/weight of mother), 6 categorical predictors. For continuous predictors, use 3 rd -order polynomials for “factors”.For continuous predictors, use 3 rd -order polynomials for “factors”. For categorical predictors, use “dummy variables” excluding the final group.For categorical predictors, use “dummy variables” excluding the final group. 75%/25% train/test split.75%/25% train/test split. Methods Compared: Group LARS, Backward Stepwise (LARS isn’t possible)Methods Compared: Group LARS, Backward Stepwise (LARS isn’t possible)

9 Real Dataset Example Minimal Cp

10 Real Dataset Example Factors Selected:Factors Selected: Group LARS: All factors except Number of Physician Visits during the First Trimester Backward Stepwise: All factors except Number of Physician Visits during the First Trimester & Mother’s Weight

11 Real Dataset Example Test Set Prediction MSE Group LARS463047 Backward Stepwise506706 Overall Test Set MSE533035

12 Simulation Example #1 17 random variables Z 1, Z 2,…, Z 16, W were independently drawn from a Normal(0,1).17 random variables Z 1, Z 2,…, Z 16, W were independently drawn from a Normal(0,1). X i = (Z i + W) / SQRT(2)X i = (Z i + W) / SQRT(2) Y = X 3 3 + X 3 2 + X 3 + (1/3)*X 6 3 - X 6 2 + (2/3)*X 6 + εY = X 3 3 + X 3 2 + X 3 + (1/3)*X 6 3 - X 6 2 + (2/3)*X 6 + ε ε ~ N(0, 2 2 )ε ~ N(0, 2 2 ) Each simulation has 100 observations, 200 simulations.Each simulation has 100 observations, 200 simulations. Methods Compared: Group LARS, LARS, Least Squares, Backward StepwiseMethods Compared: Group LARS, LARS, Least Squares, Backward Stepwise All 3 rd -order main effects are considered.All 3 rd -order main effects are considered.

13 Simulation Example #1 Group LARS LARSOLSStep wise Mean Test Set Prediction MSE 5.325.7310.947.45 Mean # of Factors Present 7.4259.435166.565

14 Simulation Example #2 20 random variables X 1, X 2,…, X 20 were generated as in Example #1.20 random variables X 1, X 2,…, X 20 were generated as in Example #1. X 11, X 12,…, X 20 are trichotomized as 0, 1, or 2 if they are smaller than the 33 rd percentile of a Normal(0,1), larger than the 66 th percentile, or in between.X 11, X 12,…, X 20 are trichotomized as 0, 1, or 2 if they are smaller than the 33 rd percentile of a Normal(0,1), larger than the 66 th percentile, or in between. Y = X 3 3 + X 3 2 + X 3 + (1/3)*X 6 3 - X 6 2 + (2/3)*X 6 +Y = X 3 3 + X 3 2 + X 3 + (1/3)*X 6 3 - X 6 2 + (2/3)*X 6 + 2 * I(X 11 = 0) + I(X 11 = 1) + ε ε ~ N(0, 2 2 )ε ~ N(0, 2 2 ) Each simulation has 100 observations, 200 simulations.Each simulation has 100 observations, 200 simulations. Methods Compared: Group LARS, LARS, Least Squares, Backward StepwiseMethods Compared: Group LARS, LARS, Least Squares, Backward Stepwise All 3 rd -order main effects/categorical factors are considered.All 3 rd -order main effects/categorical factors are considered.

15 Simulation Example #2 Group LARS LARSOLSStep wise Mean Test Set Prediction MSE 5.435.989.887.55 Mean # of Factors Present 9.619.53208.62

16 Conclusion Group LARS provides an improvement over the traditional backward stepwise selection + OLS, but still over-selects factors.Group LARS provides an improvement over the traditional backward stepwise selection + OLS, but still over-selects factors. In the simulations, stepwise selection tends to under-select factors relative to Group LARS, and performs more poorly.In the simulations, stepwise selection tends to under-select factors relative to Group LARS, and performs more poorly. Simulation #1 suggests LARS over-selects factors because it enters individual variables into the model (and not the full factor).Simulation #1 suggests LARS over-selects factors because it enters individual variables into the model (and not the full factor). Group LARS is also computationally efficient due to its piecewise linear solution path algorithm.Group LARS is also computationally efficient due to its piecewise linear solution path algorithm. is the formula for the “correlation” between a factor j and the current residual r. May select factors if a couple derived inputs are predictive and the rest being redundant. is the formula for the “correlation” between a factor j and the current residual r. May select factors if a couple derived inputs are predictive and the rest being redundant.

17 EL FIN


Download ppt "Model Selection and Estimation in Regression with Grouped Variables."

Similar presentations


Ads by Google