Presentation on theme: "1 Psych 5510/6510 Chapter Eight--Multiple Regression: Models with Multiple Continuous Predictors Part 3: Testing the Addition of Several Parameters at."— Presentation transcript:
1 Psych 5510/6510 Chapter Eight--Multiple Regression: Models with Multiple Continuous Predictors Part 3: Testing the Addition of Several Parameters at a Time Spring, 2009
2 Testing the Addition of a Set of Predictors So far we have looked at: 1.Testing the overall model. 2.Testing the addition of one more parameter to the model. Now we will look at adding more than one parameter at a time (i.e. adding a ‘set of predictors’).
3 Example Let’s say we are interested in modeling voting behavior (Y). We have several psychological measures (e.g. personality and attitude; X 1 and X 2 ) we could use in the model, and we have several socio-demographic measures (e.g. age, income, education; X 3, X 4 and X 5 ), we could use. We start with the set of two psychological measures, find that they are worthwhile, and then see if it is worthwhile to add the set of three socio-demographic variables.
4 Example (cont.) MODEL C: Ŷ i =β 0 + β 1 X 1i + β 2 X 2i MODEL A: Ŷ i =β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i + β 4 X 4i + β 5 X 5i H 0 : β 3 = β 4 = β 5 =0 H A : At least one of those 3 betas doesn’t equal 0.
5 Testing H0 Most computer programs—including SPSS—can’t directly test this scenario, as for them Model C is always the 1-parameter model of Ŷ i =β 0.
6 Solution First have SPSS do a linear regression analysis on your Model C: Ŷ i =β 0 + β 1 X 1i + β 2 X 2i SPSS will analyze this as if it were model A in the following setup: MODEL C: Ŷ i =β 0 MODEL A: Ŷ i =β 0 + β 1 X 1i + β 2 X 2i You do this just to get the SSE for the three-parameter model (which SPSS will report as ‘SS Residual’).
7 Solution Second have SPSS do a linear regression analysis on your six parameter Model A SPSS will analyze this as if it were model A in the following setup: MODEL C: Ŷ i =β 0 MODEL A: Ŷ i =β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i + β 4 X 4i + β 5 X 5i You do this just to get the SSE for the six parameter model, which again SPSS will report as SS Residual.
8 Solution Third, from these two analyses you have the SSE’s you want: SSE(C) is the SS residual from the SPSS analysis of the three parameter model. SSE(A) is the SS residual from the SPSS analysis of the six parameter model. Now you can compute SSR, PRE, and then test for the significance using any one of the three methods we have covered (test the PRE for significance, transform the PRE into an F value, or transform the MS’s into an F value).
9 Example We want to predict college GPA, so we begin with the following: MODEL C: Ŷ i =β 0 MODEL A: Ŷ i =β 0 + β 1 X 1i Where Y is college GPA and X 1 is percentile rank in high school.
10 Example (cont.) Let’s say we find that Model A is worthwhile, HS rank significantly improves the model compared to just using the mean of Y. Now we want to know if it is worthwhile to include the two SAT measures (math and verbal). If we have one we also have the other so we might want to include both. If the two are highly redundant, however, then we don’t want to add both, as our PRE per parameter added would drop compared to adding just one. But, on the other hand, they shouldn’t be too redundant or the SAT wouldn’t bother to report both.
11 Example (cont.) What we want to test: MODEL C: Ŷ i =β 0 + β 1 X 1i MODEL A: Ŷ i =β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i To get the SSE(C) you want, use SPSS to regress Y on X1. This is your Model C (above) but when you have SPSS do the regression SPSS treats it as Model A in the following setup: MODEL C: Ŷ i =β 0 MODEL A: Ŷ i =β 0 + β 1 X 1i SSE = (from SPSS ‘SS Residual’), this provides the SSE(C) for your analysis. See next slide.
12 ANOVA from regressing Y (GPA) on High School Rank SS residual is from the model: Ŷ i =β 0 + β 1 X Rank
13 Example (cont.) What we want to test: MODEL C: Ŷ i =β 0 + β 1 X 1i MODEL A: Ŷ i =β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i To get the SSE(A) you want, set up an analysis where model A is compared to the simple model: MODEL C: Ŷ i =β 0 MODEL A: Ŷ i =β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i SSE = (from SPSS ‘SS Residual’) Which is your SSE(A). See next slide.
14 ANOVA from regressing Y (GPA) on High School Rank, SAT Verbal and SAT Math. SS residual is from the model: Ŷ i =β 0 + β 1 X Rank + β 2 X SAT_V + β 3 X SAT_M
15 Example (cont.) MODEL C: Ŷ i =β 0 + β 1 X 1i MODEL A: Ŷ i =β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i SSE(C)= SSE(A)=172.99, SSR= =16.12 PRE=16.22/189.11=.085 PA=4, PC=2, N=414, PRE critical .012 Or, using the PRE tool, p<.001 Reject H0, adding the two SAT scores is worthwhile.
16 Cautions About Using Multiple Regression 1)Causal Conclusions. Be very cautious about drawing causal conclusions when you don’t manipulate the independent variable(s). That the X variables can be used to predict the Y variable does not prove that X causes Y.
17 2.Judging the Relative Importance of Predictor Variables It is tempting to judge the relative importance of the various predictor variables by comparing the value of their betas. For example, if in the regression formula you have …0.25(X 1 ) (X 2 )… then it is tempting to conclude that X 2 must play a more important role in the model of Y than does X 1. This is incorrect as the value of the beta is influenced by the scale of the predictor variable (e.g. changing X 1 from feet to inches will increase the value of its beta).
18 A way to remove the effect of the scale of X on the value of its beta is to used the standardized betas (which are the betas you use when you change all of your variables to standard scores). That may still not allow you to determine the relative importance of the predictor variables: a)If the researcher decides to use a different range of cases then the standardized regression correlations could change quite a bit even though the relationship between x and y remains the same, so in that regard they are not ‘standard’. b)The size of the standardized regression correlations are still affected by redundancy.
19 3.Automatic Model Building Example: stepwise regression. The predictor variable with highest resulting F* is added (as long as it meets some minimum threshold), then on subsequent steps the remaining variable with the highest F* is added (if it meets the threshold), or a variable already in the model whose F* drops below the threshold (due to redundancy) is removed. Continue until all the variables in the model have F*s above the threshold, and all the variables not in the model have F*s below the threshold.
20 The authors argue against automatic model building: 1.A search unfocused by theory is likely to be a fishing expedition that finds spurious relationships and creates models that don’t replicate with new data. 2.The interpretation of coefficients and the meaning of the question being asked depends upon the other variables in the model, it’s unwise to let an automatic procedure determine what questions we do and do not ask of our data. 3.Better models and better understanding of data result from focused data analysis guided by substantive theory.