Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 24 Multiple Regression (Sections 19.4-19.5)

Similar presentations


Presentation on theme: "Lecture 24 Multiple Regression (Sections 19.4-19.5)"— Presentation transcript:

1 Lecture 24 Multiple Regression (Sections 19.4-19.5)

2 The conditions required for the model assessment to apply must be checked. –Is the error variable normally distributed? –Is the regression function correctly specified as a linear function of x 1,…,x k Plot the residuals versus x’s and –Is the error variance constant? –Are the errors independent? –Can we identify outliers and influential observations? –Is multicollinearity a problem? 19.4 Regression Diagnostics - II Draw a histogram of the residuals Plot the residuals versus y ^ Plot the residuals versus the time periods

3 Influential Observation Influential observation: An observation is influential if removing it would markedly change the results of the analysis. In order to be influential, a point must either be an outlier in terms of the relationship between its y and x’s or have unusually distant x’s (high leverage) and not fall exactly into the relationship between y and x’s that the rest of the data follows.

4 Simple Linear Regression Example Data in salary.jmp. Y=Weekly Salary, X=Years of Experience.

5 Identification of Influential Observations Cook’s distance is a measure of the influence of a point – the effect that omitting the observation has on the estimated regression coefficients. Use Save Columns, Cook’s D Influence to obtain Cook’s Distance.

6 Cook’s Distance Rule of thumb: Observation with Cook’s Distance (D i ) >1 has high influence. You may also be concerned about any observation that has D i <1 but has a much bigger D i than any other observation.

7 Strategy for dealing with influential observations/outliers Do the conclusions change when the obs. is deleted? –If No. Proceed with the obs. Included. Study the obs to see if anything can be learned. –If Yes. Is there reason to believe the case belongs to a population other than the one under investigation? If Yes. Omit the case and proceed. If No. Does the case have unusually “distant” independent variables. –If Yes. Omit the case and proceed. Report conclusions for the reduced range of explanatory variables. –If No. Not much can be said. More data are needed to resolve the questions.

8 Multicollinearity Multicollinearity: Condition in which independent variables are highly correlated. Exact collinearity: Y=Weight, X 1 =Height in inches, X 2 =Height in feet. Then provide the same predictions. Multicollinearity causes two kinds of difficulties: –The t statistics appear to be too small. –The  coefficients cannot be interpreted as “slopes”.

9 Multicollinearity Diagnostics Diagnostics: –High correlation between independent variables –Counterintuitive signs on regression coefficients –Low values for t-statistics despite a significant overall fit, as measured by the F statistic.

10 Diagnostics: Multicollinearity Example 19.2: Predicting house price ( Xm19- 02) Xm19- 02 –A real estate agent believes that a house selling price can be predicted using the house size, number of bedrooms, and lot size. –A random sample of 100 houses was drawn and data recorded. –Analyze the relationship among the four variables

11 The proposed model is PRICE =  0 +  1 BEDROOMS +  2 H-SIZE +  3 LOTSIZE +  The model is valid, but no variable is significantly related to the selling price ?! Diagnostics: Multicollinearity

12 Multicollinearity is found to be a problem. Diagnostics: Multicollinearity Multicollinearity causes two kinds of difficulties: –The t statistics appear to be too small. –The  coefficients cannot be interpreted as “slopes”.

13 Remedying Violations of the Required Conditions Nonnormality or heteroscedasticity can be remedied using transformations on the y variable. The transformations can improve the linear relationship between the dependent variable and the independent variables. Many computer software systems allow us to make the transformations easily.

14 A brief list of transformations »y’ = log y (for y > 0) Use when the s  increases with y, or Use when the error distribution is positively skewed »y’ = y 2 Use when the s 2  is proportional to E(y), or Use when the error distribution is negatively skewed »y’ = y 1/2 (for y > 0) Use when the s 2  is proportional to E(y) »y’ = 1/y Use when s 2  increases significantly when y increases beyond some critical value. Reducing Nonnormality by Transformations Transformations, Example.

15 Durbin - Watson Test: Are the Errors Autocorrelated? This test detects first order autocorrelation between consecutive residuals in a time series If autocorrelation exists the error variables are not independent Residual at time i

16 Positive First Order Autocorrelation + + + + + + + Residuals Time Positive first order autocorrelation occurs when consecutive residuals tend to be similar. Then, the value of d is small (less than 2). 0 +

17 Negative First Order Autocorrelation + ++ + + + + 0 Residuals Time Negative first order autocorrelation occurs when consecutive residuals tend to markedly differ. Then, the value of d is large (greater than 2).

18 Durbin-Watson Test in JMP H 0 : No first-order autocorrelation. H 1 : First-order autocorrelation Use row diagnostics, Durbin-Watson test in JMP after fitting the model. Autocorrelation is an estimate of correlation between errors.

19 Example 19.3 (Xm19-03)Xm19-03 –How does the weather affect the sales of lift tickets in a ski resort? –Data of the past 20 years sales of tickets, along with the total snowfall and the average temperature during Christmas week in each year, was collected. –The model hypothesized was TICKETS=  0 +  1 SNOWFALL+  2 TEMPERATURE+  –Regression analysis yielded the following results: Testing the Existence of Autocorrelation, Example


Download ppt "Lecture 24 Multiple Regression (Sections 19.4-19.5)"

Similar presentations


Ads by Google