Download presentation

Published byCharles McNulty Modified over 4 years ago

1
**Multiple Regression Dummy Variables Multicollinearity**

Interaction Effects Heteroscedasticity

2
**Lecture Objectives You should be able to :**

Convert categorical variables into dummies. Identify and eliminate Multicollinearity. Use interaction terms and interpret their coefficients. Identify heteroscedasticity.

3
**I. Using Categorical Data: Dummy Variables**

X1 X2 Accidents per 10000 Car Obs Drivers Age Color 1 89 17 Red 2 70 Black 3 75 Blue 4 85 18 5 74 6 76 7 90 19 8 78 9 10 80 20 Consider insurance company data on accidents and their relationship to age of driver and the color of car driven. See spreadsheet for complete data.

4
**Coding a Categorical Variable**

Original Coding Alternate Coding Y X1 X2 Accidents per 10000 Car Drivers Age Color 89 17 1 3 70 2 75 85 18 74 76 90 19 78 80 20 This is the incorrect way. Output from the two ways of coding give inconsistent results.

5
**Original Coding: Partial Output and Forecasts**

SUMMARYOUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 27 RESIDUAL OUTPUT Observation Predicted Accidents Residuals 1 0.6444 2 3 4 0.0389 5 6 1.0389 7 8.4333 8 1.4333 9 10 1.8278 Coefficients Intercept Age Color

6
**Modified Coding: Partial Output and Forecasts**

SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 27 RESIDUAL OUTPUT Observation Predicted Accidents Residuals 1 2 3 4 5 0.7056 6 7 6.7667 8 8.1000 9 10 0.1611 Coefficients Intercept Age Color 6.6667

7
Coding with Dummies Original Dummies Alternately Coded Dummies Y X1 X2 X3 Accidents per 10000 Drivers Age D1 Red D2 Black D1 Black D2 Blue 89 17 1 70 75 85 18 74 76 90 19 78 80 20 This is the correct way. Output from either way of coding gives the same forecasts.

8
**Regression Statistics**

Original Dummy Coding SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 27 RESIDUAL OUTPUT Observation Predicted Accidents Residuals 1 2 3 4 5 6 7 5.6556 8 6.9889 9 10 Coefficients Intercept Age D1 Red D2 Black

9
**Regression Statistics**

Modified Dummy Coding SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 27 RESIDUAL OUTPUT Observation Predicted Accidents Residuals 1 2 3 4 5 6 7 5.6556 8 6.9889 9 10 Coefficients Intercept Age D1 Black D2 Blue

10
II. Multicollinearity We wish to forecast the height of a person based on the length of his/her feet. Consider data as shown: Height Right Left 77.31 11.59 11.54 67.58 9.57 9.63 70.40 8.97 8.98 64.84 9.39 9.46 77.03 12.05 12.03 79.66 11.39 11.41 72.37 10.55 10.61 73.18 10.31 10.33 77.60 11.81 71.40 9.92 9.88

11
**Regression with Right Foot**

SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 105 As right foot length increases by an inch, height increases on average by 3.99 inches. Coefficients Intercept Right

12
**Regression with Left Foot**

SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 105 As left foot length increases by an inch, height increases on average by 3.99 inches. Coefficients Intercept Left

13
**Regression Statistics**

Regression with Both SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 105 As right foot length increases by an inch, height increases on average by 8.52 inches (assuming left foot is constant!) while lengthening of the left foot makes a person shorter by 4.55 inches!! Coefficients Intercept Right Left

14
**The Reason? Multicollinearity.**

Height Right Left Height (y) 1.0000 Right (X1) 0.9031 Left (X2) 0.9003 0.9990 While both feet (Xs) are correlated with height (y), they are also highly correlated with each other (0.999). In other words, the second foot adds no extra information to the prediction of y. One of the two Xs is sufficient.

15
**III. Interaction Effects**

Scores on test of reflexes Y X1 X2 Obs Score Age Gender 1 80 25 2 82 28 3 75 32 4 70 33 5 65 35 6 60 43 7 67 46 8 55 9 56 10 11 90 24 Do reflexes slow down with age? Are there gender differences? A portion of the data is shown here.

16
**Scatterplots with Age, Gender**

Does age seem related? How about Gender?

17
**Correlation, Regression**

SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 20 Correlations Score Age 1 Gender 0.1406 Coefficients Standard Error t Stat P-value Intercept E-14 Age E-07 Gender Age is related, gender is not.

18
Interaction Term Y X1 X2 X1*X2 Obs Score Age Gender Age* 1 80 25 2 82 28 3 75 32 4 70 33 5 65 35 6 60 43 7 67 46 .. … 11 90 24 12 87 A 2-way interaction term is the product of the two variables.

19
**Regression with Interaction**

SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 20 How do we interpret the coefficient for the interaction term? Coefficients Standard Error t Stat P-value Intercept 1.49E-11 Age Gender Age*Gender

20
**Meaning of Interaction**

X1 and X2 are said to interact with each other if the impact of X1 on y changes as the value of X2 changes. In this example, the impact of age (X1) on reflexes (y) is different for males and females (changing values of X2). Hence age and gender are said to interact. Explain how this is different from multicollinearity.

21
**IV. Heteroscedasticity**

Consider the water levels in Lake Lanier. There is a trend that can be used to forecast. However, the variability around the trendline is not consistent. The increase in variation makes the prediction margin of error unreliable.

22
**Example 2: Income and Spending**

As income grows, the ability to spend on luxury goods grows with it, and so does the variation in how much is actually spent. Once again, forecasts become less reliable due to changing variation (heteroscedasticity).

23
Solution When heteroscedasticity is identified, data may need to be transformed (change to a log scale, for instance) to reduce its impact. The type of transformation needed depends on the data.

Similar presentations

Presentation is loading. Please wait....

OK

Multiple Regression Analysis

Multiple Regression Analysis

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google