Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.

Similar presentations


Presentation on theme: "Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s."— Presentation transcript:

1 Anaregweek11 Regression diagnostics

2 Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s D, DFBETAS Variance inflation factor Tolerance

3 NKNW Example NKNW p 389, section 11.1 Y is amount of life insurance X 1 is average annual income X 2 is a risk aversion score n = 18 managers

4 Manajer i Income X i1 Risk X i2 Life Insurance Y i Manajer i Income X i1 Risk X i2 Life Insurance Y i 1 66.290724010 37.408555 2 40.96457311 54.3762130 3 72.9961031112 46.1867112 4 45.01069113 46.130491 5 57.204416214 30.366314 6 26.85251115 39.060563 7 38.12245416 79.3801316 8 35.84065317 52.7668154 9 75.796932618 55.9166164

5 Partial regression plots Also called added variable plots or adjusted variable plots One plot for each X i

6 Partial regression plots (2) Consider X 1 –Use the other X’s to predict Y –Use the other X’s to predict X 1 –Plot the residuals from the first regression vs the residuals from the second regression

7 Partial regression plots (3) These plots can detect –Nonlinear relationships –Heterogeneous variances –Outliers

8 Output Source DF F Value Pr > F Model 2 542.33 <.0001 Error 15 C Total 17 Root MSE 12.66267 R-Square 0.9864

9 Output (2) Par St Var Est Err t Pr > |t| Int -205.72 11 -18.06 <.0001 income 6.288.20 30.80 <.0001 risk 4.738 1.3 3.44 0.0037

10 Plot the residuals vs each Indep Variables From the regression of Y on X 1 and X 2 we plot the residual against each of indep. Variable. The plot of residual against X 1 indicates a curvelinear effect. Therefore, we need to check further by looking at the partial regression plot

11 Plot the residuals vs Risk

12 Plot the residuals vs income

13 The partial regression plots To generate the partial regression plots Regress Y and X 1 each on X 2. Get the residual from each regression namely e(Y|X 2 ) and e(X 1 |X 2 ) Plot e(Y|X 2 ) against e(X 1 |X 2 ) Do the same for Y and X 2 each on X 1.

14 The partial regression plots (2)

15 The partial regression plots(3)

16 Residuals There are several versions –Residuals e i = Y i – Ŷ i –Studentized residuals e i / √MSE –Deleted residuals : d i = e i / (1-h ii ) where h ii is the leverage –Studentized deleted residuals d i * = d i / s(d i ) Where Or equivalenly

17 Residuals (2) We use the notation (i) to indicate that case i has been deleted from the computations X (i) is the X matrix with case i deleted MSE (i) is the MSE with case i deleted

18 Residuals (3) When we examine the residuals we are looking for –Outliers –Non normal error distributions –Influential observations

19 Hat matrix diagonals h ii is a measure of how much Y i is contributing to the prediction Y i (hat) Ŷ 1 = h 11 Y 1 + h 12 Y 2 + h 13 Y 3 + … h ii is sometimes called the leverage of the i th observation

20 Hat matrix diagonals (2) 0 < h ii < 1 Σ h ii = p We would like h ii to be small The average value is p/n Values far from this average point to cases that should be examined carefully

21 Hat diagonals Hat Diag Obs H 1 0.0693 2 0.1006 3 0.1890 4 0.1316 5 0.0756

22 DFFITS A measure of the influence of case i on Ŷ i It is a standardized version of the difference between Ŷ i computed with and without case i It is closely related to h ii

23 Cook’s Distance A measure of the influence of case i on all of the Ŷ i ’s It is a standardized version of the sum of squares of the differences between the predicted values computed with and without case i

24 DFBETAS A measure of the influence of case i on each of the regression coefficients It is a standardized version of the difference between the regression coefficient computed with and without case i

25 Variance Inflation Factor The VIF is related to the variance of the estimated regression coefficients We calculate it for each explanatory variable One suggested rule is that a value of 10 or more indicates excessive multicollinearity

26 Tolerance TOL = (1 – R 2 k ) Where R 2 k is the squared multiple correlation obtained in a regression where all other explanatory variables are used to predict X k TOL = 1/VIF Described in comment on p 411

27 Output (Tolerance) Variable Tolerance Intercept. income 0.93524 risk 0.93524

28 Last slide Read NKNW Chapter 11


Download ppt "Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s."

Similar presentations


Ads by Google