Download presentation

Presentation is loading. Please wait.

1
Multicollinearity

2
**Overview This set of slides Next set of slides**

What is multicollinearity? What is its effects on regression analyses? Next set of slides How to detect multicollinearity? How to reduce some types of multicollinearity?

3
Multicollinearity Multicollinearity exists when two or more of the predictors in a regression model are moderately or highly correlated. Regression analyses most often take place on data obtained from observational studies. In observational studies, multicollinearity happens more often than not.

4
**Types of multicollinearity**

Structural multicollinearity a mathematical artifact caused by creating new predictors from other predictors, such as, creating the predictor x2 from the predictor x. Sample-based multicollinearity a result of a poorly designed experiment, reliance on purely observational data, or the inability to manipulate the system on which you collect the data

5
Example n = 20 hypertensive individuals p-1 = 6 predictor variables

6
**Example Blood pressure (BP) is the response.**

BP Age Weight BSA Duration Pulse Age Weight BSA Duration Pulse Stress Blood pressure (BP) is the response.

7
**What is effect on regression analyses if predictors are perfectly uncorrelated?**

8
Example x1 x2 y Pearson correlation of x1 and x2 = 0.000

9
Example

10
**Regress y on x1 The regression equation is y = 48.8 - 0.63 x1**

Predictor Coef SE Coef T P Constant x Analysis of Variance Source DF SS MS F P Regression Error Total

11
**Regress y on x2 The regression equation is y = 55.1 - 1.38 x2**

Predictor Coef SE Coef T P Constant x Analysis of Variance Source DF SS MS F P Regression Error Total

12
Regress y on x1 and x2 The regression equation is y = x x2 Predictor Coef SE Coef T P Constant x x Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS x x

13
Regress y on x2 and x1 The regression equation is y = x x1 Predictor Coef SE Coef T P Constant x x Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS x x

14
**Summary of results Model b1 se(b1) b2 se(b2) Seq SS x1 only SSR(X1)**

-0.625 1.273 --- SSR(X1) 3.13 x2 only -1.375 1.170 SSR(X2) 15.13 x1, x2 (in order) 1.251 SSR(X2|X1) x2, x1 SSR(X1|X2)

15
**If predictors are perfectly uncorrelated, then…**

You get the same slope estimates regardless of the first-order regression model used. That is, the effect on the response ascribed to a predictor doesn’t depend on the other predictors in the model.

16
**If predictors are perfectly uncorrelated, then…**

The sum of squares SSR(X1) is the same as the sequential sum of squares SSR(X1|X2). The sum of squares SSR(X2) is the same as the sequential sum of squares SSR(X2|X1). That is, the marginal contribution of one predictor variable in reducing the error sum of squares doesn’t depend on the other predictors in the model.

17
**Do we see similar results for “real data” with nearly uncorrelated predictors?**

18
**Example BP Age Weight BSA Duration Pulse Age 0.659 Weight 0.950 0.407**

Stress

19
Example

20
**Regress BP on x1 = Stress The regression equation is**

BP = Stress Predictor Coef SE Coef T P Constant Stress S = R-Sq = 2.7% R-Sq(adj) = 0.0% Analysis of Variance Source DF SS MS F P Regression Error Total

21
**Regress BP on x2 = BSA The regression equation is BP = 45.2 + 34.4 BSA**

Predictor Coef SE Coef T P Constant BSA S = R-Sq = 75.0% R-Sq(adj) = 73.6% Analysis of Variance Source DF SS MS F P Regression Error Total

22
**Regress BP on x1 = Stress and x2 = BSA**

The regression equation is BP = Stress BSA Predictor Coef SE Coef T P Constant Stress BSA Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS Stress BSA

23
**Regress BP on x2 = BSA and x1 = Stress**

The regression equation is BP = BSA Stress Predictor Coef SE Coef T P Constant BSA Stress Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS BSA Stress

24
**Summary of results Model b1 se(b1) b2 se(b2) Seq SS x1 only SSR(X1)**

0.0240 0.0340 --- SSR(X1) 15.04 x2 only 34.443 4.690 SSR(X2) 419.86 x1, x2 (in order) 0.0217 0.0170 34.334 4.611 SSR(X2|X1) 417.07 x2, x1 SSR(X1|X2) 12.26

25
**If predictors are nearly uncorrelated, then…**

You get similar slope estimates regardless of the first-order regression model used. The sum of squares SSR(X1) is similar to the sequential sum of squares SSR(X1|X2). The sum of squares SSR(X2) is similar to the sequential sum of squares SSR(X2|X1).

26
**What happens if the predictor variables are highly correlated?**

27
**Example BP Age Weight BSA Duration Pulse Age 0.659 Weight 0.950 0.407**

Stress

28
Example

29
**Regress BP on x1 = Weight The regression equation is**

BP = Weight Predictor Coef SE Coef T P Constant Weight S = R-Sq = 90.3% R-Sq(adj) = 89.7% Analysis of Variance Source DF SS MS F P Regression Error Total

30
**Regress BP on x2 = BSA The regression equation is BP = 45.2 + 34.4 BSA**

Predictor Coef SE Coef T P Constant BSA S = R-Sq = 75.0% R-Sq(adj) = 73.6% Analysis of Variance Source DF SS MS F P Regression Error Total

31
**Regress BP on x1 = Weight and x2 = BSA**

The regression equation is BP = Weight BSA Predictor Coef SE Coef T P Constant Weight BSA Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS Weight BSA

32
**Regress BP on x2 = BSA and x1 = Weight**

The regression equation is BP = BSA Weight Predictor Coef SE Coef T P Constant BSA Weight Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS BSA Weight

33
**Effect #1 of multicollinearity**

When predictor variables are correlated, the regression coefficient of any one variable depends on which other predictor variables are included in the model. Variables in model b1 b2 x1 1.20 ---- x2 34.4 x1, x2 1.04 5.83

34
**Even correlated predictors not in the model can have an impact!**

Regression of territory sales on territory population, per capita income, etc. Against expectation, coefficient of territory population was determined to be negative. Competitor’s market penetration, which was strongly positively correlated with territory population, was not included in model. But, competitor kept sales down in territories with large populations.

35
**Effect #2 of multicollinearity**

When predictor variables are correlated, the precision of the estimated regression coefficients decreases as more predictor variables are added to the model. Variables in model se(b1) se(b2) x1 0.093 ---- x2 4.69 x1, x2 0.193 6.06

36
Why not effects #1 and #2?

37
Why not effects #1 and #2?

38
Why effects #1 and #2?

39
**Effect #3 of multicollinearity**

When predictor variables are correlated, the marginal contribution of any one predictor variable in reducing the error sum of squares varies, depending on which other variables are already in model. SSR(X1) = SSR(X1|X2) = 88.43 SSR(X2) = SSR(X2|X1) = 2.81

40
**What is the effect on estimating mean or predicting new response?**

41
**Effect #4 of multicollinearity on estimating mean or predicting Y**

Weight Fit SE Fit % CI % PI (111.85,113.54) (108.94,116.44) BSA Fit SE Fit % CI % PI (112.76,115.38) (108.06,120.08) BSA Weight Fit SE Fit % CI % PI (111.93,113.83) (109.08, ) High multicollinearity among predictor variables does not prevent good, precise predictions of the response (within scope of model).

42
**What is effect on tests of individual slopes?**

The regression equation is BP = BSA Predictor Coef SE Coef T P Constant BSA S = R-Sq = 75.0% R-Sq(adj) = 73.6% Analysis of Variance Source DF SS MS F P Regression Error Total

43
**What is effect on tests of individual slopes?**

The regression equation is BP = Weight Predictor Coef SE Coef T P Constant Weight S = R-Sq = 90.3% R-Sq(adj) = 89.7% Analysis of Variance Source DF SS MS F P Regression Error Total

44
**What is effect on tests of individual slopes?**

The regression equation is BP = Weight BSA Predictor Coef SE Coef T P Constant Weight BSA Analysis of Variance Source DF SS MS F P Regression Error Total Source DF Seq SS Weight BSA

45
**Effect #5 of multicollinearity on slope tests**

When predictor variables are correlated, hypothesis tests for βk = 0 may yield different conclusions depending on which predictor variables are in the model. Variables in model b2 se(b2) t P-value x2 34.4 4.7 7.34 0.000 x1, x2 5.83 6.1 0.96 0.350

46
**The major impacts on model use**

In the presence of multicollinearity, it is okay to use an estimated regression model to predict y or estimate μY in scope of model. In the presence of multicollinearity, we can no longer interpret a slope coefficient as … the change in the mean response for each additional unit increase in xk, when all the other predictors are held constant

Similar presentations

OK

Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.

Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on question tags youtube Download ppt on rotation and revolution of earth Ppt on library management system project in c++ Ppt on water pollution remedies Ppt on schottky diode Ppt on question tags rules Ppt on social etiquettes in india Free ppt on human evolution Mba ppt on introduction to business finance Ppt on cash flow statement