MULTIPLE REGRESSION
OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables Methods of Entering Variables Adjusted R 2 Adjusted R 2 Using z-Scores Using z-Scores
WHAT MAKES IT MULTIPLE? Predict from a combination of two or more predictor (X) variables. Predict from a combination of two or more predictor (X) variables. The regression model may account for more variance with more predictors. The regression model may account for more variance with more predictors. Look for predictor variables with low inter- correlations. Look for predictor variables with low inter- correlations.
Multiple Regression Equation Like simple regression, use a linear equation to predict Y scores. Like simple regression, use a linear equation to predict Y scores. Use the least squares solution. Use the least squares solution.
Assumptions for Regression Quantitative data (or dichotomous) Quantitative data (or dichotomous) Independent observations Independent observations Predict for same population that was sampled Predict for same population that was sampled Linear relationship Linear relationship
Assumptions for Regression Homoscedasticity Homoscedasticity Independent errors Independent errors Normality of errors Normality of errors
ADDITIONAL ASSUMPTIONS Large ratio of sample size to number of predictor variables Large ratio of sample size to number of predictor variables Minimum 15 subjects per predictor variable Minimum 15 subjects per predictor variable Predictor variables are not strongly intercorrelated (no multicollinearity) Predictor variables are not strongly intercorrelated (no multicollinearity) Examine VIF – should be close to 1 Examine VIF – should be close to 1
Multicollinearity When predictor variables are highly intercorrelated with each other, prediction accuracy is not as good. When predictor variables are highly intercorrelated with each other, prediction accuracy is not as good. Be cautious about determining which predictor variable is predicting the best when there is high collinearity among the predictors. Be cautious about determining which predictor variable is predicting the best when there is high collinearity among the predictors.
METHODS OF ENTERING VARIABLES Simultaneous Simultaneous Hierarchical/Block Entry Hierarchical/Block Entry Stepwise Stepwise Forward Forward Backward Backward Stepwise Stepwise
Simultaneous Multiple Regression All predictor variables are entered into the regression at the same time All predictor variables are entered into the regression at the same time Allows you to determine portion of variance explained by each predictor with the others statistically controlled (part correlation) Allows you to determine portion of variance explained by each predictor with the others statistically controlled (part correlation)
Hierarchical Multiple Regression Enter variables in a particular order based on a theory or on prior research Enter variables in a particular order based on a theory or on prior research Can be done with blocks of variables Can be done with blocks of variables
Stepwise Multiple Regression Enter or remove predictor variables one at a time based on explaining significant portions of variance in the criterion Enter or remove predictor variables one at a time based on explaining significant portions of variance in the criterion Forward Forward Backward Backward Stepwise Stepwise
Forward Stepwise begin with no predictor variables begin with no predictor variables add predictors one at a time according to which one will result in the largest increase in R 2 add predictors one at a time according to which one will result in the largest increase in R 2 stop when R 2 will not be significantly increased stop when R 2 will not be significantly increased
Backward Stepwise begin with all predictor variables begin with all predictor variables remove predictors one at a time according to which one will result in the smallest decrease in R 2 remove predictors one at a time according to which one will result in the smallest decrease in R 2 stop when R 2 would be significantly decreased stop when R 2 would be significantly decreased may uncover suppressor variables may uncover suppressor variables
Suppressor Variable Predictor variable which, when entered into the equation, increases the amount of variance explained by another predictor variable Predictor variable which, when entered into the equation, increases the amount of variance explained by another predictor variable In backward regression, removing the suppressor would likely result in a significant decrease in R 2, so it will be left in the equation In backward regression, removing the suppressor would likely result in a significant decrease in R 2, so it will be left in the equation
Suppressor Variable Example Y = Job Performance Rating Y = Job Performance Rating X1 = College GPA X1 = College GPA X2 = Writing Test Score X2 = Writing Test Score
Suppressor Variable Example Let’s say Writing Score is not correlated with Job Performance, because the job doesn’t require much writing Let’s say Writing Score is not correlated with Job Performance, because the job doesn’t require much writing Let’s say GPA is only a weak predictor of Job Performance, but it seems like it should be a good predictor Let’s say GPA is only a weak predictor of Job Performance, but it seems like it should be a good predictor
Suppressor Variable Example Let’s say GPA is “contaminated” by differences in writing ability – really good writers can fake and get higher grades Let’s say GPA is “contaminated” by differences in writing ability – really good writers can fake and get higher grades So, if Writing Score is in the equation, the contamination is removed, and we get a better picture of the GPA-Job Performance relationship So, if Writing Score is in the equation, the contamination is removed, and we get a better picture of the GPA-Job Performance relationship
Stepwise begin with no predictor variables begin with no predictor variables add predictors one at a time according to which one will result in the largest increase in R 2 add predictors one at a time according to which one will result in the largest increase in R 2 at each step remove any variable that does not explain a significant portion of variance at each step remove any variable that does not explain a significant portion of variance stop when R 2 will not be significantly increased stop when R 2 will not be significantly increased
Choosing a Stepwise Method Forward Forward Easier to conceptualize Easier to conceptualize Provides efficient model for predicting Y Provides efficient model for predicting Y Backward Backward Can uncover suppressor effects Can uncover suppressor effects Stepwise Stepwise Can uncover suppressor effects Can uncover suppressor effects Tends to be unstable with smaller N’s Tends to be unstable with smaller N’s
ADJUSTED R 2 R 2 may overestimate the true amount of variance explained. R 2 may overestimate the true amount of variance explained. Adjusted R 2 compensates by reducing the R 2 according to the ratio of subjects per predictor variable. Adjusted R 2 compensates by reducing the R 2 according to the ratio of subjects per predictor variable.
BETA WEIGHTS The regression weights can be standardized into beta weights. The regression weights can be standardized into beta weights. Beta weights do not depend on the scales of the variables. Beta weights do not depend on the scales of the variables. A beta weight indicates the amount of change in Y in units of SD for each SD change in the predictor. A beta weight indicates the amount of change in Y in units of SD for each SD change in the predictor.
Example of Reporting Results of Multiple Regression We performed a simultaneous multiple regression with vocabulary score, abstraction score, and age as predictors and preference for intense music as the dependent variable. The equation accounted for a significant portion of variance, F(3,66) = 4.47, p =.006. As shown in Table 1, the only significant predictor was abstraction score. We performed a simultaneous multiple regression with vocabulary score, abstraction score, and age as predictors and preference for intense music as the dependent variable. The equation accounted for a significant portion of variance, F(3,66) = 4.47, p =.006. As shown in Table 1, the only significant predictor was abstraction score.
Take-Home Points Multiple Regression is a useful, flexible method. Multiple Regression is a useful, flexible method. Find the right procedure for your purpose. Find the right procedure for your purpose.