Presentation on theme: "Model selection Best subsets regression. Statement of problem A common problem is that there is a large set of candidate predictor variables. Goal is."— Presentation transcript:
Model selection Best subsets regression
Statement of problem A common problem is that there is a large set of candidate predictor variables. Goal is to choose a small subset from the larger set so that the resulting regression model is simple, yet have good predictive ability.
Example: Cement data Response y: heat evolved in calories during hardening of cement on a per gram basis Predictor x 1 : % of tricalcium aluminate Predictor x 2 : % of tricalcium silicate Predictor x 3 : % of tetracalcium alumino ferrite Predictor x 4 : % of dicalcium silicate
Example: Cement data
Two basic methods of selecting predictors Stepwise regression: Enter and remove predictors, in a stepwise manner, until no justifiable reason to enter or remove more. Best subsets regression: Select the subset of predictors that do the best at meeting some well-defined objective criterion.
Why best subsets regression? # of predictors (p-1) # of regression models 12 : ( ) (x 1 ) 24 : ( ) (x 1 ) (x 2 ) (x 1, x 2 ) 38: ( ) (x 1 ) (x 2 ) (x 3 ) (x 1, x 2 ) (x 1, x 3 ) (x 2, x 3 ) (x 1, x 2, x 3 ) 416: 1 none, 4 one, 6 two, 4 three, 1 four
Why best subsets regression? If there are p-1 possible predictors, then there are 2 p-1 possible regression models containing the predictors. For example, 10 predictors yields 2 10 = 1024 possible regression models. A best subsets algorithm determines the best subsets of each size, so that choice of the final model can be made by researcher.
What is used to judge “best”? R-squared Adjusted R-squared MSE (or S = square root of MSE) Mallow’s C p
R-squared Use the R-squared values to find the point where adding more predictors is not worthwhile because it leads to a very small increase in R-squared.
Adjusted R-squared or MSE Adjusted R-squared increases only if MSE decreases, so adjusted R-squared and MSE provide equivalent information. Find a few subsets for which MSE is smallest (or adjusted R-squared is largest) or so close to the smallest (largest) that adding more predictors is not worthwhile.
Mallow’s C p criterion The goal is to minimize the total standardized mean square error of prediction: which equals: which in English is:
Mallow’s C p criterion Mallow’s C p statistic estimates where: SSE p is the error sum of squares for the fitted (subset) regression model with p parameters. MSE(X 1,…, X p-1 ) is the MSE of the model containing all p-1 predictors. It is an unbiased estimator of σ 2. p is the number of parameters in the (subset) model
Facts about Mallow’s C p Subset models with small C p values have a small total standardized MSE of prediction. When the C p value is … –near p, the bias is small (next to none), –much greater than p, the bias is substantial, –below p, it is due to sampling error; interpret as no bias. For the largest model with all possible predictors, C p = p (always).
Using the C p criterion So, identify subsets of predictors for which: –the C p value is smallest, and –the C p value is near p (if possible) In general, though, don’t always choose the largest model just because it yields C p = p.
Best Subsets Regression: y versus x1, x2, x3, x4 Response is y x x x x Vars R-Sq R-Sq(adj) C-p S X X X X X X X X X X X X X X X X
Stepwise Regression: y versus x1, x2, x3, x4 Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is y on 4 predictors, with N = 13 Step Constant x T-Value P-Value x T-Value P-Value x T-Value P-Value S R-Sq R-Sq(adj) C-p
Example: Modeling PIQ
Best Subsets Regression: PIQ versus MRI, Height, Weight Response is PIQ H W e e i i M g g R h h Vars R-Sq R-Sq(adj) C-p S I t t X X X X X X X X X
Stepwise Regression: PIQ versus MRI, Height, Weight Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is PIQ on 3 predictors, with N = 38 Step 1 2 Constant MRI T-Value P-Value Height T-Value P-Value S R-Sq R-Sq(adj) C-p
Example: Modeling BP
Best Subsets Regression: BP versus Age, Weight,... Response is BP D u W r S e a P t i t u r A g B i l e g h S o s s Vars R-Sq R-Sq(adj) C-p S e t A n e s X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
Stepwise Regression: BP versus Age, Weight, BSA, Duration, Pulse, Stress Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is BP on 6 predictors, with N = 20 Step Constant Weight T-Value P-Value Age T-Value P-Value BSA 4.6 T-Value 3.04 P-Value S R-Sq R-Sq(adj) C-p
Best subsets regression Stat >> Regression >> Best subsets … Specify response and all possible predictors. If desired, specify predictors that must be included in every model. (Researcher’s knowledge!) Select OK. Results appear in session window.