Download presentation

Presentation is loading. Please wait.

Published byChase Roche Modified over 3 years ago

1
Example 12.2 Multicollinearity

2
12.112.1 | 12.3 | 12.3a | 12.1a | 12.4 | 12.4a | 12.1b | 12.5 | 12.4b12.312.3a12.1a12.412.4a12.1b12.512.4b The Problem n We want to explain a persons height by means of foot length. n The response variable is Height, and the explanatory variables are Right and Left, the length of the right foot and the left foot, respectively. n What can occur when we regress Height on both Right and Left?

3
12.112.1 | 12.3 | 12.3a | 12.1a | 12.4 | 12.4a | 12.1b | 12.5 | 12.4b12.312.3a12.1a12.412.4a12.1b12.512.4b Multicollinearity n The relationship between the explanatory variable X and the response variable Y is not always accurately reflected in the coefficient of X; it depends on which other Xs are included or not included in the equation. n This is especially true when there is a linear relationship between to or more explanatory variables, in which case we have multicollinearity. n By definition multicollinearity is the presence of a fairly strong linear relationship between two or more explanatory variables, and it can make estimation difficult.

4
12.112.1 | 12.3 | 12.3a | 12.1a | 12.4 | 12.4a | 12.1b | 12.5 | 12.4b12.312.3a12.1a12.412.4a12.1b12.512.4b Solution n Admittedly, there is no need to include both Right and Left in an equation for Height - either one would do - but we include both to make a point. n It is likely that there is a large correlation between height and foot size, so we would expect this regression equation to do a good job. n The R 2 value will probably be large. But what about the coefficients of Right and Left? Here is a problem.

5
12.112.1 | 12.3 | 12.3a | 12.1a | 12.4 | 12.4a | 12.1b | 12.5 | 12.4b12.312.3a12.1a12.412.4a12.1b12.512.4b Solution -- continued n The coefficient of Right indicates that the right foots effect on Height in addition to the effect of the left foot. This additional effect is probably minimal. That is, after the effect of Left on Height has already been taken into account, the extra information provided by Right is probably minimal. But it goes the other way also. The extra effort of Left, in addition to that provided by Right, is probably minimal.

6
12.112.1 | 12.3 | 12.3a | 12.1a | 12.4 | 12.4a | 12.1b | 12.5 | 12.4b12.312.3a12.1a12.412.4a12.1b12.512.4b HEIGHT.XLS n To show what can happen numerically, we generated a hypothetical data set of heights and left and right foot lengths in this file. n We did this so that, except for random error, height is approximately 32 plus 3.2 times foot length (all expressed in inches). n As shown in the table to the right, the correlations between Height and either Right or Left in our data set are quite large, and the correlation between Right and Left is very close to 1.

7
12.112.1 | 12.3 | 12.3a | 12.1a | 12.4 | 12.4a | 12.1b | 12.5 | 12.4b12.312.3a12.1a12.412.4a12.1b12.512.4b Solution -- continued n The regression output when both Right and Left are entered in the equation for Height appears in this table.

8
12.112.1 | 12.3 | 12.3a | 12.1a | 12.4 | 12.4a | 12.1b | 12.5 | 12.4b12.312.3a12.1a12.412.4a12.1b12.512.4b Solution -- continued n This output tells a somewhat confusing story. n The multiple R and the corresponding R 2 are about what we would expect, given the correlations between Height and either Right or Left. n In particular, the multiple R is close to the correlation between Height and either Right or Left. Also, the s e value is quite good. It implies that predictions of height from this regression equation will typically be off by only about 2 inches.

9
12.112.1 | 12.3 | 12.3a | 12.1a | 12.4 | 12.4a | 12.1b | 12.5 | 12.4b12.312.3a12.1a12.412.4a12.1b12.512.4b Solution -- continued n However, the coefficients of Right and Left are not all what we might expect, given that we generated heights as approximately 32 plus 3.2 times foot length. n In fact, the coefficient of Left has the wrong sign - it is negative! n Besides this wrong sign, the tip-off that there is a problem is that the t-value of Left is quite small and the corresponding p-value is quite large.

10
12.112.1 | 12.3 | 12.3a | 12.1a | 12.4 | 12.4a | 12.1b | 12.5 | 12.4b12.312.3a12.1a12.412.4a12.1b12.512.4b Solution -- continued n Judging by this, we might conclude that Height and Left are either not related or are related negatively. But we know from the table of correlations that both of these are false. n In contrast, the coefficient of Right has the correct sign, and its t-value and associated p-value do imply statistical significance, at least at the 5% level. n However, this happened mostly by chance, slight changes in the data could change the results completely.

11
12.112.1 | 12.3 | 12.3a | 12.1a | 12.4 | 12.4a | 12.1b | 12.5 | 12.4b12.312.3a12.1a12.412.4a12.1b12.512.4b Solution -- continued n The problem is although both Right and Left are clearly related to Height, it is impossible for the least squares method to distinguish their separate effects. n Note that the regression equation does estimate the combined effect fairly well, the sum of the coefficients is 3.178 which is close to the coefficient of 3.2 we used to generate the data. n Therefore, the estimated equation will work well for predicting heights. It just does not have reliable estimates of the individual coefficients of Right and Left.

12
12.112.1 | 12.3 | 12.3a | 12.1a | 12.4 | 12.4a | 12.1b | 12.5 | 12.4b12.312.3a12.1a12.412.4a12.1b12.512.4b Solution -- continued n To see what happens when either Right or Left are excluded from the regression equation, we show the results of simple regression. n When Right is only variable in the equation, it becomes Predicted Height = 31.546 + 3.195Right n The R 2 and s e values are 81.6% and 2.005, and the t- value and p-value for the coefficient of Right are now 21.34 and 0.000 - very significant.

13
12.112.1 | 12.3 | 12.3a | 12.1a | 12.4 | 12.4a | 12.1b | 12.5 | 12.4b12.312.3a12.1a12.412.4a12.1b12.512.4b Solution -- continued n Similarly, when the Left is the only variable in the equation, it becomes Predicted Height = 31.526 + 3.197Left n The R 2 and s e values are 81.1% and 2.033, and the t- value and p-value for the coefficient of Left are 20.99 and 0.0000 - again very significant. n Clearly, both of these equations tell almost identical stories, and they are much easier to interpret than the equation with both Right and Left included.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google