Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).

Similar presentations


Presentation on theme: "Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6)."— Presentation transcript:

1 Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).

2 Assumptions of Multiple Linear Regression Model 1.Linearity: 2.Constant variance: The standard deviation of Y for the subpopulation of units with is the same for all subpopulations. 3.Normality: The distribution of Y for the subpopulation of units with is normally distributed for all subpopulations. 4.The observations are independent [For time series; we will cover later.]

3 Assumptions for linear regression and their importance to inferences InferenceAssumptions that are important Point prediction, point estimation Linearity, independence Confidence interval for slope, hypothesis test for slope, confidence interval for mean response Linearity, constant variance, independence, normality (only if n<30) Prediction intervalLinearity, constant variance, independence, normality

4 Fast Food Chain Data

5 Checking Linearity Plot residuals versus each of the explanatory variables. Each of these plots should look like random scatter, with no pattern in the mean of the residuals. If residual plots show a problem, then we could try to transform the x-variable and/or the y-variable. Residual Plot: Use Fit Y by X with Y being Residuals. Fit Line will draw horizontal Line.

6 Residual Plots in JMP After Fit Model, click red triangle next to Response, click Save Columns and click Residuals. Use Fit Y by X with Y=Residuals and X the explanatory variable of interest. Fit Line will draw a horizontal line with intercept zero. It is a property of the residuals from multiple linear regression that a least squares regression of the residuals on an explanatory variable has slope zero and intercept zero.

7 Residual by Predicted Plot Fit Model displays the Residual by Predicted Plot automatically in its output. The plot is a plot of the residuals versus the predicted Y’s, We can think of the predicted Y’s as summarizing all the information in the X’s. As usual we would like this plot to show random scatter. Pattern in the mean of the residuals as the predicted Y’s increase: Indicates problem with linearity. Look at residual plots versus each explanatory variable to isolate problem and consider transformations. Pattern in the spread of the residuals: Indicates problem with constant variance.

8 Corrections for Violations of the Linearity Assumption When the residual plot shows a pattern in the mean of the residuals for one of the explanatory variables X j, we should consider: –Transforming the X j. –Adding polynomial variables in X j — –Transforming Y After making the transformation/adding polynomials, we need to refit the model and look at the new residual plot vs. X to see if linearity has been achieved.

9 Quadratic Polynomials for Age and Income

10 Linearity now appears to be satisfied.

11 Checking Constant Variance Assumption Residual plot versus explanatory variables should exhibit constant variance. Residual plot versus predicted values should exhibit constant variance (this plot is often most useful for detecting nonconstant variance)

12 Heteroscedasticity When the requirement of a constant variance is violated we have a condition of heteroscedasticity. Diagnose heteroscedasticity by plotting the residual against the predicted y. + + + + + + + + + + + + + + + + + + + + + + + + The spread increases with y ^ y ^ Residual ^ y + + + + + + + + + + + + + + + + + + + + + + +

13 A brief list of transformations »y’ = y 1/2 (for y > 0) Use when the RMSE increases with »y’ = log y (for y > 0) Use when the RMSE increases with Use when the residual distribution is skewed to the right. »y’ = y 2 Use when the RMSE is decreasing with, or Use when the residual distribution is left skewed Reducing Nonconstant Variance/Nonnormality by Transformations

14 Checking whether a transformation of Y works for remedying Non- constant variance 1.Create a new column with the transformation of the Y variable by right clicking in the new column and clicking Formula and putting in the appropriate formula for the transformation (Note: Log is contained in the class of transcendental functions) 2.Fit the regression of the transformation of Y on the X variables 3.Check the residual by predicted plot to see if the spread of the residuals appears constant over the range of predicted values.

15

16

17 Interpreting coefficients when response is logged, explanatory variables not logged

18 Checking Normality To check normality, we use a normal quantile plot. Normality is a reaonable assumption if all the residuals are within the dashed confidence bands. If a residual is outside the dashed confidence bands, it indicates a violaton of normality.

19 Normality does not appear to hold. Some of the residuals fall outside the dotted line confidence bands.

20 Normality appears to hold. All residuals within dotted confidence bands.

21 Importance of Normality and Corrections for Normality For point estimation/confidence intervals/tests of coefficients and confidence intervals for mean response, normality of residuals is only important for small samples because of Central Limit Theorem. Guideline: Do not need to worry about normality if there are 30 observations plus 10 additional observations for each additional explanatory variable in multiple regression beyond the first one. For prediction intervals, normality is critical for all sample sizes. Corrections for normality: transformations of y variable.

22 A brief list of transformations »y’ = y 1/2 (for y > 0) Use when the spread of residuals increases with »y’ = log y (for y > 0) Use when the spread of residuals increases with Use when the distribution of the residuals is skewed to the right. »y’ = y 2 Use when the spread of residuals is decreasing with, Use when the error distribution is left skewed Reducing Nonconstant Variance/Nonnormality by Transformations

23 Order of Correction of Violations of Assumptions in Multiple Regression First, focus on correcting a violation of the linearity assumption. Then, focus on correction violations of constant variance after the linearity assumption is satifised. If constant variance is achieved, make sure that linearity still holds approximately. Then, focus on correctiong violations of normality. If normality is achieved, make sure that linearity and constant variance still approximately hold.


Download ppt "Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6)."

Similar presentations


Ads by Google