Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Psych 5510/6510 Chapter Eight--Multiple Regression: Models with Multiple Continuous Predictors Part 2: Testing the Addition of One Parameter at a Time.

Similar presentations


Presentation on theme: "1 Psych 5510/6510 Chapter Eight--Multiple Regression: Models with Multiple Continuous Predictors Part 2: Testing the Addition of One Parameter at a Time."— Presentation transcript:

1 1 Psych 5510/6510 Chapter Eight--Multiple Regression: Models with Multiple Continuous Predictors Part 2: Testing the Addition of One Parameter at a Time Spring, 2009

2 2 Overall Test In part one we looked at the overall test of the parameters in Model A: Model C: Ŷ i =β 0 (where β 0 = μ Y ) PC=1 Model A: Ŷ i =β 0 + β 1 X i1 + β 2 X i2 +...+β p-1 X ip-1 PA=p

3 3 Disadvantages The disadvantages of this overall test are: 1.If some of the parameters in A are worthwhile and some are not, the PRE per parameter added may not be very impressive, with the weaker parameters washing out the effects of the stronger. 2.As with the overall F test in ANOVA, our alternative hypothesis is very vague, that at least one β 1 through β p-1 doesn’t equal 0. If Model A is worthwhile overall, we don’t know which of its individual parameters contributed to that worthwhileness.

4 4 One parameter test It is usually more interesting to test adding one parameter at a time (PA-PC=1) to our model. Model C Ŷ i =β 0 + β 1 X i1 + β 2 X i2 +...+β p-1 X ip-1 Model A Ŷ i =β 0 + β 1 X i1 + β 2 X i2 +...+β p-1 X ip-1 + β p X ip HO: β p = 0 HA: β p  0

5 5 Model C: Ŷ i =β 0 + β 1 X i1 + β 2 X i2 +...+β p-1 X ip-1 Model A: Ŷ i =β 0 + β 1 X i1 + β 2 X i2 +...+β p-1 X ip-1 + β p X ip The values of β 1 through β p-1 will probably change when β p X ip is added to the model (as we will see, this is due to redundancy among the predictor variables). Remember our subscripting (useful when the situation is not clear from the context): β 4.123 (i.e. the value of β 4 when 1,2, and 3 are included in the model)

6 6 Redundancy When we use more than one predictor variable in our model then an important issue arises; specifically, to what degree are the predictor variables redundant (i.e. share information). For example, using both a child’s age and a child’s height to predict their weight is somewhat redundant, as there is a relationship between their height and age. Please review the Venn diagrams on redundancy from Part 1.

7 7 Redundancy Thus two or more predictor variables are redundant to the degree to which they are correlated. Let’s say we are going to add another predictor variable X p to the model below: Model C: Ŷ i = β 0 + β 1 X i1 +β 2 X i2 + β 3 X i3 and we want to know how redundant X p may be with the X variables that are already in the model. Well, we know how to determine that...

8 8 Measuring the Redundancy of X p with X 1, X 2, and X 3 Yes indeed, we know how to measure the relationship between X p and X 1, X 2, and X 3. The R² (i.e. PRE) of moving from Model C to Model A is the measure of the redundancy between X p and X 1, X 2, and X 3..

9 9 Redundancy We can measure the redundancy between the variable we are going to add (X p ) and those variables already in the model (X 1 through X p-1 ) by seeing how well those already added variables can predict the value of X p. To do this we will regress X p on variables X 1 through X p-1, and look at the resulting PRE. The PRE for regressing X p on variables X 1 through X p-1 is symbolized as R² p, which is shorter version of the full symbol which would be R² p.123…p-1

10 10 Tolerance Conversely, tolerance is a measure of how unique a variable is compared to the other predictor variables already in the model. If tolerance is low then the variable is redundant and can add little to the model, if tolerance is high then the variable is not very redundant, and thus has the ability to add significantly to the model (if it is correlated to Y of course). (For a pictorial representation of these ideas see the handout on ‘Tolerance’)

11 11 Confidence Intervals

12 12 Low Tolerance The formula for the confidence interval of β includes tolerance in its denominator (look back at that formula), if tolerance is low then the confidence interval of β is large (and thus rejecting β=0 becomes unlikely). If tolerance is very low then the confidence interval for β becomes huge, meaning that we become increasingly unable to determine the true value of β, and the accuracy of some computations begins to drop. Because of this, when tolerance is below.01 (or.001) some statistical programs issue a warning message.

13 13 Variable Inflation Factor Because a low tolerance makes the confidence interval wider, some programs report the variance inflation factor (VIF) which is the inverse of the tolerance.

14 14 Back to the One Parameter Test We are looking at the PRE of adding one new predictor variable to our model: Model C: Ŷ i =β 0 + β 1 X i1 + β 2 X i2 +...+β p-1 X ip-1 Model A: Ŷ i =β 0 + β 1 X i1 + β 2 X i2 +...+β p-1 X ip-1 + β p X ip H0: η² = 0 HA: η² > 0, or equivalently, H0: β p = 0 HA: β p  0

15 15 Statistical Significance SPSS makes this easy, simply regress Y on the variables of Model A. For each β in the model SPSS provides its confidence interval, and the values of ‘t’ and ‘p’ for the test of whether that β = 0. Not only do we get the information needed to decide whether it is worthwhile to add Xp to a model that contains the other variables, we get the same information about adding each variable last to a model that contains the other variables…

16 16 Significance (cont.) …for each variable SPSS gives us the PRE for adding that variable to a model containing all of the other variables, and tells us whether or not the β that goes with the variable differs from zero. So in addition to testing β p similar information is provided for β 1 (see below) and all the other β’s: Model C: Ŷ i =β 0 + β 2 X i2 +...+β p-1 X ip-1 + β p X ip Model A: Ŷ i =β 0 + β 2 X i2 +...+β p-1 X ip-1 + β p X ip + β 1 X i1 H0: β 1 = 0 HA: β 1  0 And so on for each β.

17 17 Coefficient of Partial Determination The PRE from adding a new predictor variable to a model that already contains predictor variables is called the ‘coefficient of partial determination’. It is symbolized as r² Yp.123…p-1 (the PRE of adding variable X p to the model of Y when variables X 1 - X p-1 are already included). See the handout on ‘Partial Correlations’.

18 18 Partial Correlation Coefficient The square root of the coefficient of partial determination is called the ‘partial correlation coefficient’. It is symbolized as r Yp.123…p-1 It represents the correlation between Y and X p when the influences of the other predictor variables have been removed from both Y and X p.

19 19 More Descriptions of ‘Partial Correlation Coefficient’ It is the correlation between Y and X p when the other predictor variables are ‘held constant’. It is the correlation between Y and X p for people who have identical scores on the other predictor variables.

20 20 Part Correlations Another correlation sometimes examined (but not in our approach) is called the ‘part’ or ‘semipartial’ correlation. In this correlation the influence of the other predictor variables (X 1 -X p-1 ) is only removed from X p, rather than from both X p and Y (see the handout on Partial Correlations).

21 21 Partial This and Partial That We have three ‘partial’ terms: Partial regression coefficient: the value of β (or equivalently est. β = b) that goes with a particular predictor variable. Partial correlation coefficient: the correlation between Y and a particular predictor variable after the influence of the other predictor variables has been removed from both Y and that variable. Partial coefficient of determination: the PRE of adding a particular predictor variable to a model that already contains the other predictor variables. It is the (partial correlation coefficient)² Now let’s see how the terms connect.

22 22 Back to Our Example Dependent Variable: GPA Predictor Variables: 1.HS_Rank 2.SAT_V 3.SAT_M Let’s look at the various ‘partial’ values that go with the predictor variable SAT_M.

23 23 The ‘Partial’ Plot 1.Use the other predictor variables (HS rank and SAT_V) to predict Y i. 2.Compute the error of those predictions (I.e. create a variable consisting of Y i –Ŷ i ). This is a variable of residuals (showing how much the actual Y scores vary from what HS rank and SAT_V can predict). Name this variable Y residuals.

24 24 The ‘Partial’ Plot (cont.) 3.Use the other predictor variables (HS rank and SAT_V) to predict SAT_M. 4.Compute the error of those predictions (I.e. create a variable consisting of SAT_M i actual – SAT_M i predicted scores). This is a variable of residuals (showing how much the actual SAT_M scores vary from what HS rank and SAT_V can predict). Name this variable SAT_M residuals.

25 25 The ‘Partial’ Plot (cont.) 5.Now graph the scatter plot of Y residuals and SAT_M residuals. This is the relationship between Y and SAT_M after the influence of the other predictor variables have been removed (from both of them). This is equivalent to saying the relationship between Y and SAT_M when the values of the other variables are held constant.

26 26 The ‘Partial’ Plot (cont.)

27 27 The ‘Partial’ Plot (cont.) The partial regression coefficient is the slope of that regression line. The partial correlation coefficient is the correlation shown in the plot (the correlation between the Y residuals and SAT_M residuals ). The partial coefficient of determination is the r² of that correlation (how much we gain by using the regression line rather than the mean of the Y residuals scores to predict the Y residuals scores). Note that the mean of the Y residuals scores =0.

28 28 Back to Our Example (again) Dependent Variable: GPA Predictor Variables: 1.HS_Rank 2.SAT_V 3.SAT_M See SPSS printout.

29 29 Test of Worthwhileness of Overall Model Y=GPA Model C: Ŷ i = β 0 (where β 0 is μ Y ) Model A: Ŷ i = β 0 + β 1 ( HSRank i ) + β 2 (SAT_V i ) + β 3 (SAT_M i ) PRE=.220 F*=38.544 p<.001 est. η²=.214

30 30 A Look at Each Predictor Variable We will now examine each predictor variable individually, looking at the analysis of adding each variable last to a model that already contains the other predictor variables.

31 31 HSRank: Analysis Model C: Ŷ i = β 0 + β 2 (SAT_V i ) + β 3 (SAT_M i ) Model A: Ŷ i = β 0 + β 2 (SAT_V i ) + β 3 (SAT_M i ) + β 1 ( HSRank i ) From SPSS: Ŷ i = -1.739 +.027 ( HSRank i ) +.011(SAT_V i ) +.022(SAT_M i ) 1.b 1 =.027, test to determine whether β 1  0: t=8.3, p<.001 2.Partial correlation between HS rank and GPA (i.e. the correlation between those two variables when the other predictor variables are held constant (i.e. the influences of the other predictor variables have been removed from HS rank and GPA): 0.38 3.PRE of adding HS rank to the model (i.e. moving from Model C to Model A): 0.38²=0.14 (p<.001, same as from part ‘1’ above). Extra parameter of Model A worthwhile.

32 32 HSRank: Residual Plot The relationship between HSRank and GPA with the other variables held constant. The slope of the regression line is.027 (i.e. b 1 ), the correlation between HSRank and GPA in this plot is.38 (i.e. the partial correlation), the PRE of using HSRank to predict GPA is.38²=0.14

33 33 SAT_V Model C: Ŷ i = β 0 + β 1 ( HSRank i ) + β 3 (SAT_M i ) Model A: Ŷ i = β 0 + β 1 ( HSRank i ) + β 2 (SAT_V i ) + β 3 (SAT_M i ) As before: Ŷ i = -1.739 +.027 ( HSRank i ) +.011(SAT_V i ) +.022(SAT_M i ) 1.b 2 =.011, test to determine whether β 2  0: t=2.5, p=.011 2.Partial correlation between SAT_V and GPA (i.e. the correlation between those two variables when the other predictor variables are held constant (i.e. the influences of the other predictor variables have been removed from SAT_V and GPA): 0.126 3.PRE of adding SAT_V to the model (i.e. moving from Model C to Model A):.126²=0.016 (p=.011, same as from part ‘1’ above). Extra parameter of Model A worthwhile.

34 34 SAT_V: Residual Plot The relationship between SAT_V and GPA with the other variables held constant. The slope of the regression line is.011 (i.e. b 2 ), the correlation between SAT_V and GPA in this plot is.126 (i.e. the partial correlation), the PRE of using SAT_V to predict GPA is.126²=0.016

35 35 SAT_M Model C: Ŷ i = β 0 + β 1 ( HSRank i ) + β 2 (SAT_V i ) Model A: Ŷ i = β 0 + β 1 ( HSRank i ) + β 2 (SAT_V i ) + β 3 (SAT_M i ) As before: Ŷ i = -1.739 +.027 ( HSRank i ) +.011(SAT_V i ) +.022(SAT_M i ) 1.b 3 =.022, test to determine whether β 3  0: t=4.5, p<.000 2.Partial correlation between SAT_M and GPA (i.e. the correlation between those two variables when the other predictor variables are held constant (i.e. the influences of the other predictor variables have been removed from SAT_M and GPA): 0.216 3.PRE of adding SAT_M to the model (i.e. moving from Model C to Model A):.216²=0.047 (p<.000, same as from part ‘1’ above). Extra parameter of Model A worthwhile.

36 36 SAT_M: Residual Plot The relationship between SAT_M and GPA with the other variables held constant. The slope of the regression line is.022 (i.e. b 3 ), the correlation between SAT_M and GPA in this plot is.216 (i.e. the partial correlation), the PRE of using SAT_M to predict GPA is.216²=0.047

37 37 Tolerances HSRank=.995 SAT_V=.893 SAT_M=.890 The values of the tolerances show that the predictor variables were not very redundant, leaving each with the opportunity to significantly add to the model if their correlation with Y is high.

38 38 Another Example We are interested in the relationship between unemployment (UN) and industrial production (IP). We expect there to be a negative correlation between the two (the higher the industrial production that year the lower the unemployment, and vice versa).

39 39 Data YearUN (millions) IPYear Code 19503.11131 19511.91232 19521.71273 19531.61384 19543.21305 19552.71466 19562.61517 19572.91528 19584.71419 19593.815910

40 40 MODELS Y=UN X=IP MODEL C: Ŷ i =β 0 =2.82 MODEL A: Ŷ i =β 0 + β 1 X i =-.035+.021(X i ) PRE=.098, p=.379 Not only do we not reject H0, but the slope was unexpectedly a positive value!

41 41 Scatter Plot UN and IP

42 42 Bringing ‘Year’ into the Model Let’s take a look at the relationship between unemployment and year (using the ‘year codes’ of 1 through 10). Y=UN X=Year MODEL C: Ŷ i =β 0 =2.82 MODEL A: Ŷ i =β 0 + β 1 X i =1.67+.21(X i ) PRE=.428, p=.04 We reject H0, it is worthwhile to add year to the model (compare to using just the mean)

43 43 Scatter Plot UN and Year

44 44 Year and IP Let’s see year’s ability to predict industrial production (IP). Y=IP X=Year MODEL C: Ŷ i =β 0 =138 MODEL A: Ŷ i =β 0 + β 1 X i =114+4.75(X i ) PRE=.821, p<.001 We reject H0, year is also good for predicting industrial production.

45 45 Year as a Suppressor Variable Perhaps the variable ‘year’ is having a large effect on both unemployment (UN) and industrial production (IP), and is thus masking the relationship between UN and IP. If this is true then year would be called a suppressor variable.

46 46 Residuals Let’s take a look at the relationship between unemployment and industrial production when the effects of year are removed from both variables.

47 47 Residuals from Using Year to Predict UN and IP Year CodeUN residualsIP residuals 11.22-5.36 2-.19.27 3-.60-.09 4-.916.56 5.48-5.82 6-.225.82 7-.536.45 8-.443.09 91.15-12.27 10.041.36

48 48 Scatterplot of Residuals

49 49 UN and IP Residuals Partial correlation coefficient: -0.88 PRE=.77 p=.002 So there is a negative correlation between UN and IP once the effect of year has been taken out of both UN and IP.

50 50 SPSS Output We don’t need to actually compute the residuals of using year to predict unemployment, and then using year to predict industrial production, to find the relationship between unemployment and industrial production after the effect of year on both variables has been incorporated into the model. SPSS gives us all that in its computation of the partial regression coefficient and the partial correlation coefficient. See the handout from the course web site.

51 51 What We Are Doing MODEL C: Ŷ i =β 0 + β 1 Year i MODEL A: Ŷ i =β 0 + β 1 Year i + β 2 IP i There are a couple of ways of thinking about what we are doing in this example: 1.We are testing to see if adding IP is worthwhile to a model that already contains year. 2.We are examining the relationship between IP and UN when the effect of Year is held constant.

52 52 Review of Terms MODEL C: Ŷ i =β 0 + β 1 Year i MODEL A: Ŷ i =β 0 + β 1 Year i + β 2 IP i The β’s are partial regression coefficients, their values will be influenced both by the relationship between the predictor variable and Y as well as by the other predictor variables in the model. The relationship between a predictor variable and Y when the effects of the other predictor variables on both are controlled (held constant) is called the partial correlation coefficient. Squaring the partial correlation coefficient gives you the coefficient of partial determination, which is the PRE of adding that predictor variable to a model that already contains the other variables.


Download ppt "1 Psych 5510/6510 Chapter Eight--Multiple Regression: Models with Multiple Continuous Predictors Part 2: Testing the Addition of One Parameter at a Time."

Similar presentations


Ads by Google