Multiple Regression.

Multiple Regression

Multiple regression Previously discussed the one predictor scenario
Multiple regression is the case of having two or more variables predicting some outcome This basic idea is the same as simple regression, however much more will need to be considered in its interpretation

The best fitting plane Before we attempted to find the best fitting line to our 2d scatterplot of values With the addition of another predictor our cloud of values becomes 3 dimensional Now we are looking for what amounts to the best fitting plane With 3 or more predictors we get into hyperspace and are dealing with a regression surface Regression equation*: *As one can see, we still have the general linear model

Linear combination The notion of a linear combination* is important for you to understand for MR and multivariate techniques in general Again, what MR analysis does is create a linear combination (weighted sum) of the predictors The weights are important to help us assess the nature of the predictor-DV relationships with consideration of the other variables in the model We then look to see how the linear combination in a sense matches up with the DV One way to think about it is we extract relevant information from predictors to help us understand the DV *If the concept of a linear combination remains fuzzy to you, you will not be able to understand any multivariate techniques, multiple regression, contrast analyses in ANOVA etc. etc. You want to make sure to get it down now.

MR Example 1 2 Stage of 3 Condom X’ Use  4 New Linear Combination
(X1) Pros of Condom Use 1 (X2) Cons of Condom Use 2 Stage of Condom Use (X3) Self-Efficacy of Condom Use 3 X’  4 (X4) Psychosexual Functioning New Linear Combination

Considerations in multiple regression
Assumptions Overall fit Parameter estimates and variable importance Variable entry Relationships among the predictors Prediction

Assumptions: Normality
The assumptions for simple regression will continue to hold Normality, homoscedasticity, independence Mulitvariate normality can be at least partially checked through examination of individual variables for normality, linearity, and heteroscedasticity Most statistical packages* provide tests for multivariate normality *Except for SPSS as near as I can tell. A simple example in R would be from the library(MASS), mshapiro.test(my.correlation.matrix)

Assumptions: Model Misspecification
In addition, we must worry about model misspecification Omitting relevant variables, including irrelevant ones, incorrect paths Not much one can do about omitting relevant variables, but it may produce biased and less valid results However we can’t just throw in all the variables we can think of also Overfitting Violation of Ockham's razor Including irrelevant variables contributes to the standard error of estimate (and thus the SE for our coefficients) which will affect the statistical tests on individual variables

Example data Current salary predicted by educational level, time since hire, and previous experience (N = 474)* As with any analysis, initial data analysis should be extensive prior to examination of the inferential analysis *This is from the employee data in the SPSS folder of any install.

Initial examination of data
We can use the descriptives to give us a general feel for what’s going on with the variables in question Here we can also see that months since hire and previous experience are not too well correlated with our dependent variable of current salary Ack! We’d also want to look at the scatterplots to further aid our assessment of the predictor-DV relationships

Starting point: Statistical significance of the model
The ANOVA summary table* tells us whether our model is statistically adequate R2 different from zero The regression equation is a better predictor than the mean As with simple regression, the analysis involves the ratio of variance predicted to residual variance As we can see, it is reflective of the relationship of the predictors to the DV (R2), the number of predictors in the model, and sample size *If by now you do not know where each of those values in an ANOVA table are coming from, you need to do more studying. The entire table can be filled out with as few as 3 (appropriate) entries filled in as well as the calculation of R2.

Multiple correlation coefficient
The multiple correlation coefficient is the correlation between the DV and the linear combination of predictors which minimizes the sum of the squared residuals More simply, it is the correlation between the observed values and the values that would be predicted by our model Its squared value (R2) is the amount of variance in the dependent variable accounted for by the independent variables

R2 Here it appears we have an OK model for predicting current salary
R2 is an upwardly biased statistic. Unless there is a little difference between the adjusted and regular values* you should report the adjusted *If it changes the decimal place to which you are reporting, you should report the adjusted value. Typically only two decimal places are provided as a measure of effect size.

Variable importance: Statistical significance
After noting that our model is viable, we can begin our interpretation of the predictors’ relative contributions To begin with we can examine the output to determine which variables statistically significantly contribute to the model Standard error measure of the variability that would be found among the different slopes estimated from other samples drawn from the same population

Variable importance: Statistical significance
We can see from the output that only previous experience and education level are statistically significant predictors

Variable importance: Weights
Statistical significance, as usual, is only a starting point for our assessment of results and in general it is a pretty poor way to determine the value of one variable relative to another What we’d really want is a measure of the unique contribution of a predictor to the model Unfortunately the regression coefficient, though useful in understanding that particular variable’s relationship to the DV, is not useful for comparing to other predictors that are of a different scale Here is a unique situation where we could have put them all on the same scale, but education level is in years, while the others are in months

Variable importance: Standardized coefficients
Standardized regression coefficients get around that problem Now we can see how much the DV will change in standard deviation units with one standard deviation unit change in the predictor (all others held constant) Here we can see that education level seems to have much more influence on the DV relative to the other two predictors Another 3 years of education is >$11000 bump in salary

Variable importance However we still have other output to help us understand variable contribution Partial correlation is the contribution of a predictor after the contributions of the other predictor s have been taken out of both the predictor and DV When squared, it is a measure of variance left over that can be accounted for uniquely by the variable Semi-partial correlation is the unique contribution of an predictor after the contribution of other predictors have been taken only out of the predictor in question When squared, it is a measure of that part of the total variance that can be accounted for uniquely by a specific variable

Variable importance: Partial correlation
Figure 1 Figure 2 A+B+C+D represents all the variability in the DV to be explained (fig. 1 and 2) A+B+C = R2 for the model The squared partial correlation is the amount a variable explains relative to the amount in the DV that is left to explain after the contributions of the other predictors have been removed from both the predictor and criterion It is A/(A+D) (fig. 3) For Predictor 2 it would be B/(B+D) (fig. 4) DV Predictor 1 Predictor 2 Figure 3 Figure 4

Variable importance: Semipartial correlation
The semipartial correlation (squared) is perhaps the more useful measure of contribution It refers to the unique contribution of A to the model, i.e. the relationship between the DV and predictor after the contributions of the other predictors have been removed from the predictor in question A/(A+B+C+D) For predictor 2 B/(A+B+C+D) Interpretation (of the squared value): Out of all the variance to be accounted for, how much does this variable explain that no other predictor does or How much would R2 drop if the variable were removed?

Variable importance Note that exactly how partial and semi-partial will be figured will depend on the type of multiple regression employed. The previous examples concerned a standard multiple regression situation. As a preview, for sequential (i.e. hierarchical) regression the partial correlation would be Predictor1 = (A+C)/(A+C+D) Predictor2 = B/(B+D) Predictor 1 Predictor 2

Variable importance For semi-partial correlation
Predictor 1 = (A+C)/(A+B+C+D) Predictor 2 same as before The result for the addition of the second variable is the same as it would be in standard MR Thus if the goal is to see the unique contribution of a single variable after all others have been controlled for, you can obtain the same information from squared semipartials that you could from doing a sequential regression In general terms, it is the unique contribution of the variable at the point it enters the equation (sequential or stepwise)

Variable importance: Example data
The semipartial correlation is labeled as ‘part’ correlation in SPSS* Here we can see that education level is really doing all the work in this model Obviously from some alternate universe *This seems to be an SPSS quirk. The vast majority of references to semi-partial correlation call it semi-partial.

Another example Mental health symptoms predicted by number of doctor visits, physical health symptoms, number of stressful life events

Another example Here we see that physical health symptoms and stressful life events both significantly contribute to the model in a statistical sense Examining the unique contributions, physical health symptoms seem more ‘important’, but unless you run a bootstrapping procedure to provide a statistical test for their difference, you have zero evidence to suggest one is contributing significantly more than the other In other words, that difference my just be due to sampling variability and unless you test that you cannot say that e.g and .223 are statistically different

Variable Importance: Comparison
Comparison of standardized coefficients, partial, and semi-partial correlation coefficients All of them are ‘partial’ correlations in the sense that they give an account of the relationship among a predictor and the DV after removing the effects of others in some fashion

Another Approach to Variable Importance
The methods just provided give us a glimpse as to variable importance, but interestingly we don’t have a unique contribution statistic that is a true decomposition of R-squared, i.e. that we could add each measure of importance to equal our overall R-squared There is one that provides an average R2 increase, depending on the order a predictor enters into the model 3 predictor example A B C; B A C, C A B etc. One way to think about it using what you’ve just learned is thinking of the squared semi-partial correlation whether a variable is first second third etc. This statistic is an average semi-partial, and note that the average is for all possible permutations E.g. the R-square contribution for B being first in the model includes B A C and B C A, both of which would of course be the same value The following example comes from a class survey data

As Predictor 2: R2 change = .639 and .087
Note there are 2 models in which war would be first and have that value (war, math, bush; war, bush, math) Outcome: Score on a Social Conservativism/Liberalism scale As Predictor 2: R2 change = .639 and .087 Here there are two models in which war would be second (math, war, bush; bush, war, math) As Predictor 3: R2 change = .098 There are 2 models in which war would be last and have that value.

Interpretation The average of these* is the average contribution to R square for a particular variable over all possible orderings In this case for war it is ~.36, i.e. on average, it increases R square 36% of variance accounted for Furthermore, if we add up the average R-squared contribution for all three… = .65 .65 is the R2 for the model *Again the average includes all values, so the six from previous ( )/6 = .363 for the predictor War.

Example The output of interest is the LMG statistic, is what we were just talking about. LMG stands for Lindemann, Merenda and Gold, authors who introduced it in the early 80s ‘Last’ is simply the squared semi-partial correlation ‘First’ is just the square of the simple bivariate correlation between predictor and DV ‘Beta squared’ is the square of the beta coefficient with all variables in ‘Pratt’ is the product of the standardized coefficient and the simple bivariate correlation It too will add up to the model R2 but is not recommended, one reason being that it can actually be negative library(relaimpo) RegModel.1 <- lm(SOCIAL~BUSH+MTHABLTY+WAR, data=Dataset) calc.relimp(RegModel.1, type = c("lmg", "last", "first", "betasq", "pratt")) *Note the relaimpo package is equipped to provide bootstrapped estimates lmg last first betasq pratt BUSH MATH WAR

Different Methods Note that one’s assessment of relative importance may depend on the method Much of the time those methods will largely agree, but they may not, so use multiple estimates to help you decide One might go with the LMG typically as it is both intuitive and a decomposition of R2 lmg last first betasq pratt BUSH MATH WAR

Relative Importance Summary
There are multiple ways to estimate a variable’s contribution to the model, and some may be better than others A general approach: Check simple bivariate relationships If you don’t see worthwhile correlations with the DV there you shouldn’t expect much from your results regarding the model* Check for outliers and compare with robust measures also You may detect that some variables are so highly correlated that one is redundant Statistical significance is not a useful means of assessing relative importance, nor is the raw coefficient typically Standardized coefficients and partial correlations are a first step Compare standardized to simple correlations as a check on possible suppression Of typical output the semi-partial correlation is probably the more intuitive assessment The LMG is also intuitive, and is a natural decomposition of R2, unlike the others *You can’t create something out of nothing. For some reason I’ve found many that have tried that were surprised to learn that nothing came of a regression technique even though their simple correlations showed clearly there wasn’t anything to begin with.

Relative Importance Summary
One thing to keep in mind is that determining variable importance, while possible for a single sample, should not be overgeneralized Variable orderings likely will change upon repeated sampling E.g. while one might think that war and bush are better than math (it certainly makes theoretical sense), saying that either would be better than the other would be quite a stretch with just one sample What you see in your sample is specific to it, and it would be wise to not make any bold claims without validation

Regression Summary At this point there are a few key ideas regarding the regression analysis to note: Goal Prediction or Explanation? What’s being done? Least squares General linear model Linear combination Model Fit Statistical, R2 effect size Interpreting the ANOVA table Error in prediction If the goal is prediction, utility of coefficients for future data and prediction If the goal is explanation, variable importance

Multiple Regression.

Similar presentations

Presentation on theme: "Multiple Regression."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multiple Regression.

Similar presentations

Presentation on theme: "Multiple Regression."— Presentation transcript:

Similar presentations

About project

Feedback