Presentation on theme: "All Possible Regressions and Statistics for Comparing Models"— Presentation transcript:
1All Possible Regressions and Statistics for Comparing Models Animal Science 500Lecture No. 12October 12, 2010
2Example analysis The RSQUARE Procedure RECALL The RSQUARE procedure selects optimal subsets of independent variables in a multiple regression analysis
3Example analysis PROC RSQUARE options; MODEL dependents = independents / options;(options can appear in either PROC RSQUARE or any MODEL statement).SELECT = n specific maximum number of subset modelsINCLUDE = I requests that the first I variables after the equal sign be included in every regressionSIGMA = n specifies the true standard deviation of the error termADJRSQ computes R2 adjusted for degrees of freedomCP computes MALLOWS’ Cp statistic
4Example analysisPROC RSQUARE options; MODEL dependents = independents / options; (options can appear in either PROC RSQUARE or any MODEL statement). PROC RSQUARE DATA=name OUTEST=EST ADJRSQ MSE CP; SELECT=n; MODEL = variable list;
6PROC STEPWISEThe STEPWISE procedure provides five methods for stepwise regression.General form:PROC STEPWISE;MODEL dependents = independents / options;Run;Quit;** Assumes that you have at least one dependent variable and 2 or more independent variables. If only one independent variable exists then you are just doing a simple regression of x on y or y on x.
7Types of Regression Uses of PROC REG for standard problems: PROC REG; /* simple linear regression */ model y = x;PROC REG; /* weighted linear regression */ model y = x;weight w;PROC REG; /* multiple regression */model y = x1 x2 x3;
8PROC REG General form: PROC REG; MODEL dependents = independents / options;Options available include:NOINT – regression with no interceptFORWARDA forward selection analysis starts out with no predictors in the model.Each predictor that that was chosen by the user is evaluated with respect to see how much the R2 is increased by adding it to the model.The predictor that increases the R2 will be added if it meets the statistical conditions for entryWith SAS the statistical conditions is the significance level for the increase in the R2 produced by addition of the predictor.If no predictor meets the condition, the analysis stops.If a predictor is added, then the second step involves re-evaluating all of the available predictors which have not yet been entered into the model.If any satisfy the statistical condition for entry, the predictor increasing the R2 the greatest is added.This process is continued until no predictors remain that could enter.
9PROC REG General form: PROC REG; MODEL dependents = independents / options;Options available include:BACKWARDIn a backwards elimination analysis we start out with all of the predictors in the model.At each step we evaluate the predictors which are in the model and eliminate any that meet the criterion for removal.STEPWISEStepwise selection begins similar to forwards selection. However at each “step” variables that are in the model are first evaluated for removal. Those meeting removal criteria are evaluated to see which would lower the R2, the least.How does this work where a variable enters and then might leave later? If two predictors ultimately enter the model, one may be removed because they are well correlated and removing one impacts the R2 very little if at all.
10PROC REG General form: PROC REG; MODEL dependents = independents / options;Options available include:MAXRThe maximum R2 option does not settle on a single model. Instead, it tries to find the "best" one-variable model, the "best" two-variable model, and so forth. ,MAXR starts out by finding the single variable model producing the greatest R2 After finding the one variable MAXR then another variable is added until it finds the variable that increases the R2 the most. It continues this process until it stops where the addition of another variable is no better than the previous (i.e. adding the 4th variable did not significantly improve the R2 compared to the 3 variable model for example.The difference between the STEPWISE and MAXR options is that all switches are evaluated before any switch is made in the MAXR method .Using the STEPWISE option, the "worst" variable may be removed without considering what adding the "best" remaining variable might accomplish.
11PROC REG General form: PROC REG; MODEL dependents = independents / options;Options available include:MINRThe MINR option closely resembles the MAXR method. However, the switch chosen with the MINR option is switch that produces the smallest increase in R2. In a way approaching the “best” model in reverse compared to MAXR.
12PROC REG General form: PROC REG; MODEL dependents = independents / options;Options available include:SLE=valueThis option sets some criterion for entry into the model. This can be defined by the user by meeting some level of change or Δ to the R2SLS=valueThis option sets some criterion for staying or remaining in the model. This can be defined by the user by meeting some level of change or Δ to the R2to stay in the model.
13PROC REGThe default statistical levels for each type of regression analysis is different unless it is changed by the user:The defaults are:BACKWARD = 0.10FORWARD = 0.10STEPWISE = 0.15User can set it by using the SLSTAY option for example / SLSTAY=.05.
14Significance Tests for the Regression Coefficients Finding the significance of the parameter estimates by using the F or t test (will see in a couple of slides)R2 = R-Square is the proportion of variation in the dependent variable (Y) that can be explained by the predictors (X variables) in the regression model.Adjusted R2 Predictors could be added to the model which would continue to improve the ability of the predictors to explain the dependent variable. Some of the improvement in the R-Square would be simply due to chance variation. The adjusted R-Square attempts to yield a more honest value to estimate R-Square.= 1-(1-R2) (n-1)/(n-p-1)whereR2 = the unadjusted R2n = the number of number of observations, andp = the number of predictors
15Significance Tests for the Regression Coefficients The Mallows’ Cp statisticCP (Cp) = SSE / σ2 + 2p – nwhereSSE = error sums of squaresσ2 = the estimate of pure error variance from the SIGMA = option for from fitting the full modelp = the number of parameters includingthe intercept, andn = the number of observations
16F and T tests for significance for overall model